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Express Mail Label: EV329723961US 

BIMODAL BURST SWITCHING 

FIELD OF THE INVENTION 

The present invention relates to a data network having a fast-switching optical 
core and, in particular, to a high-perfonnance burst-switching network having low 
5 burst-transfer delay and zero burst loss. 

BACKGROUND 

Prior-art optical burst switching has two main drawbacks: burst-transfer 
latency and burst loss. In a closed-loop scheme, a source node sends a request to a 
core node for transferring a burst, the request including a destination and size of the 

10 burst, and waits for a message from the core node, where the message 

acknowledges that the optical switch in the core node is properly configured, before 
sending the burst. In an open-loop scheme, the burst follows the burst transfer 
request after a predetermined time period, presumably sufficient to schedule the 
; burst transfer across the core, and it is expected that, when the burst arrives at the 

15 core node, the controller of the core node would have set up an internal path through 
the optical switch to a target output port of the optical switch. The main drawback of 
this technique is the uncertainty of the fate of a burst thus transmitted. Even if a very 
long time gap is kept between a burst-transfer request and the data burst itself, the 
lack of buffers at the core node may result in burst loss and a significant idle time. 

20 Thus, in the closed-loop scheme, the time delay experienced in sending a 

burst transfer request and receiving an acceptance before sending a burst may be 
unacceptably high, leading to idle waiting periods and low network utilization in 
addition to requiring large storage at the edge nodes. In the open-loop scheme, a 
burst may an^ive at a core node before the optical switch can be configured to switch 

25 the burst and the burst may be lost. Furthennore, the fact that the burst has been 
lost at the core node remains unknown to the source node for some time and a lost 
burst would have to be sent again after a predefined interval of time. 

In a wide-coverage network, the round-trip propagation delay from an edge 
node, comprising a paired source node and a sink node, to a core node can be of 
30 the order of tens of milliseconds. This renders closed-loop burst scheduling 
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inappropriate. In closed-loop switching, a source node and a core node must 
exchange messages to determine the transmission time of each burst. The high 
round-trip delay requires that the source node have sizeable buffer storage. On the 
other hand, open-loop burst scheduling, which overcomes the delay problem, can 
5 result in substantial burst loss due to unresolved contention at the core nodes. It is 
desirable that data bursts formation at the source nodes and subsequent transfer to 
respective optical core nodes be performed with low delay, and that burst transfer 
across the core be strictly loss-free. It is also desirable that the processing effort 
and transport overhead be negligibly small. 

10 Applicant's United States patent application 09/750,071 , filed on December 

29, 2000 and titled ''Burst Switching in a High-Capacity Network", discloses a 
method of burst switching where burst transfer requests from edge nodes are sent to 
a core-node controller which determines a schedule for conflict-free burst switching 
through the core node. Scheduling information is distributed to the sources of the 

15 burst transfer requests and to a configuration controller of the core node. Instead of 
handling burst requests one-by-one, burst requests are pipelined and the handling of 
the bursts is scheduled over a future period, thus realizing efficient utilization of 
network resources. 

Applicant's copending United States Patent application, serial number 
20 10/054,509, filed on November 13, 2001 and titled "Time-Coordination in a Burst- 
Switching Network", discloses a method and apparatus for low latency loss-free 
burst switching. Burst schedules are initiated by controllers of bufferless core nodes 
and distributed to respective edge nodes. Burst formation takes place at source 
nodes and a burst size is determined according to an allocated flow rate of a burst 
25 stream to which the burst belongs. An allocated flow rate of a burst stream may be 
modified according to observed usage of scheduled bursts of a burst stream. A 
method of control-burst exchange between each of a plurality of edge nodes and 
each of a plurality of bufferless core nodes enables burst scheduling, time 
coordination, and loss-free burst switching. 

30 The method of application 09/750,071 generally yields higher network 

efficiency and is attractive when the propagation delay between an edge node and a 
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core node is relatively small, of the order of a millisecond for example, or when large 
delay is acceptable. The method of application 10/054,509 is attractive when the 
propagation delay between an edge node and a core node is relatively large. 

Burst communication is preferably incorporated within a flexible time-shared 
5 network that also provides both channel switching and time-division-multiplexed 
switching. Providing high-capacity wide-coverage time-shared networks using 
bufferless core nodes presents a significant challenge due to the difficulty of time- 
alignment at the bufferless core nodes. Structures of time-shared networks that 
facilitate extending network coverage and capacity are needed. 

1 0 Furthermore, methods for burst switching adapted to take into account widely 

varying inter-nodal propagation delays, different network topologies, and diverse 
performance requirements are needed to realize a wide-coverage high-performance 
time-shared network. 

It is also desirable to explore means for dynamic division of a parent network 
1 5 into embedded networks serving user communities having different service 
specifications or requiring private control. 

SUMMARY OF THE INVENTION 

To attain both low delay and zero burst loss in a burst-switching network 
20 comprising edge nodes interconnected by bufferless core nodes, the proximity of an 
edge node to the core node to which it is connected is used to detemiine whether to 
schedule burst transfer from an edge node based on descriptors of waiting data 
bursts or anticipated data bursts having sizes determined according to prescribed 
flow rates. 

25 In accordance with an aspect of the present invention, there is provided a 

method of transfer of a plurality of signal burst streams each comprising a 
succession of signal bursts from a first node to a second node. According to the 
method, the first node transmits first-mode requests to the second node, each first- 
mode request including burst descriptors, and receives from the second node a 

30 schedule including a transmission time corresponding to each of the burst 
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descriptors. The first node may also transmit second-mode requests to the second 
node, each second-mode request including at least a flow-rate allocation, and 
receive from the second node a succession of burst-transfer permits each burst- 
transfer permit including a burst-size limit and a corresponding transmission time. 

5 In accordance with another aspect of the present invention, there is provided 

a method of bimodal burst scheduling implemented in a core node of a network 
comprising edge nodes interconnected by core nodes. The method comprises steps 
of: receiving burst-transfer requests including waiting-burst descriptors; receiving a 
flow-rate-allocation request for each of a plurality of burst streams; generating time- 
1 0 regulated burst descriptors for each of the burst streams; merging the waiting-burst 
descriptors and the time-regulated burst descriptors into a combined list of burst 
descriptors for each outlet port of the core node; and computing a scheduled arrival 
time at the core node for each burst in the combined list. 

In accordance with a further aspect of the present invention, there is provided 
1 5 a bimodal core controller associated with a core node in a network comprising edge 
nodes interconnected through core nodes. The bimodal core controller comprises: a 
burst-descriptor generator operable to generate burst-descriptors; a burst scheduler 
operable to schedule the transfer of bursts across the core node; and a burst- 
transfer-request processor. The burst-transfer-request processor is operable to 
20 receive burst-transfer requests, associate each burst-transfer request with a first 
mode or a second mode, direct each burst-transfer request associated with the first 
mode to the burst scheduler; and direct each burst-transfer request associated with 
the second mode to the burst-descriptor generator to generate burst descriptors then 
direct generated burst descriptors to the burst scheduler. 

25 In accordance with a further aspect of the present invention, there is provided 

a bimodal burst switching network comprising a plurality of edge nodes each having 
an edge controller, the plurality of edge nodes connecting to at least one core node 
wherein at least one edge node from the plurality of edge nodes is operable to: 
determine its proximity to a selected core node; select a first-mode of burst transfer 

30 when the proximity is within a predefined proximity threshold; and select a second- 
mode of burst transfer when the proximity is beyond the predefined proximity. 
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In accordance with a further aspect of the present invention, there is provided 
a method of bimodal burst scheduling implemented by a first edge node selected 
from among a plurality of edge nodes interconnected by a plurality of independent 
bufferless core nodes, where each edge node has an edge-node controller and each 
5 core node has a core-node controller. The method comprises steps of detemiining a 
total flow-rate requirement for a connection from the first edge node to at least one 
destination edge node and selecting a core node from among the core nodes 
through which to route the connection. Where the selected core node is within a 
predefined proximity from the first edge node the method includes steps of fonming a 

1 0 data burst according to the total flow-rate requirement, sending burst transfer 
requests, each specifying a destination edge node and a burst size, to the core 
controller, and receiving from the core controller a schedule for burst transmission. 
Where the selected core node is beyond the predefined proximity from the first edge 
node, the method includes steps of sending requests for flow-rate-allocations, each 

15 request specifying a destination edge node and a flow-rate requirement and 

receiving from the core node burst-transfer permits each specifying a burst-size limit, 
a destination, and a corresponding transmission time. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Embodiments of the present invention will be further described with reference 
20 to the accompanying exemplary drawings, in which: 

FIG. 1 illustrates a prior-art network comprising electronic edge nodes 
interconnected by optical core nodes; 

FIG. 2 illustrates a prior art network comprising electronic edge nodes 
interconnected by core nodes with each port of each core node provided with a 
25 buffer; 

FIG. 3 illustrates a network comprising electronic edge nodes interconnected 
by core nodes with buffers provided only at inward ports of the core nodes in 
accordance with an embodiment of the present Invention; 
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FIG. 4 illustrates a network comprising electronic edge nodes interconnected 
by core nodes with buffers provided only at outward ports of the core nodes in 
accordance with an embodiment of the present invention; 

FIG. 5 illustrates a network comprising electronic edge nodes interconnected 
5 by core nodes with completely bufferiess core nodes for use with an embodiment of 
the present invention; 

FIG. 6 illustrates a segment of a network of the structure of FIG. 3 where a 
first core node is connected to a second core node by a bufferiess path for use with 
an embodiment of the present invention; 

10 FIG. 7 Illustrates paths traversing a single core node, two core nodes, and 

three core nodes in the network of FIG. 3; 

FIG. 8 illustrates paths traversing a single core node, two core nodes, and 
three core nodes in the network of FIG. 4; 

FIG. 9 Illustrates paths traversing a single core node, two core nodes, and 
15 three core nodes in the network of FIG. 5; 

FIG. 10 illustrates a mechanism for time-locking an edge node to a core node 
for use in an embodiment of the present invention; 

FIG. 1 1 illustrates a prior-art network having a plurality of edge nodes 
interconnected by independent bufferiess core nodes; 

20 FIG. 12 illustrates time-division-multiplexing, burst transmission, and 

interieaved burst and TDM data transfer in accordance with an embodiment of the 
present invention; 

FIG. 13 illustrates a process of burst scheduling in a switch fabric having five 
input ports and five output ports for use in an embodiment of the present invention; 

25 FIG. 14 illustrates a process of interieaved burst and TDM scheduling in a 

switch fabric having five input ports and five output ports in accordance with an 
embodiment of the present invention; 

FIG. 15 illustrates an edge node adapted to transmit and receive data bursts 
from core nodes in accordance with an embodiment of the present invention; 
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FIG. 16 illustrates a core node receiving and transmitting control signals over 
separate spectral bands for use in an embodiment of the present invention; 

FIG. 17 illustrates a core node receiving and transmitting control signals 
interleaved with payload signals for use in an embodiment of the present invention; 

5 FIG. 18 illustrates a prior-art core node in the network of FIG. 2; 

FIG. 19 illustrates a core node in the network of FIG. 3 in accordance with an 
embodiment of the present invention; 

FIG. 20 illustrates a core node in the network of FIG. 4 in accordance with an 
embodiment of the present invention; 

10 FIG. 21 A illustrates signal switching from an inward port provided with a buffer 

to a plurality of outlet ports in an optical switching fabric in accordance with an 
embodiment of the present invention; 

FIG. 21 B illustrates signal switching from a bufferless inward port to a plurality 
of outlet ports in an optical switching fabric in accordance with an embodiment of the 
15 present invention; ?. 

FIG. 22 illustrates an an^angement for connecting optical switches each 
comprising a plurality of switch fabric with selected inward ports provided with buffers 
in accordance with an embodiment of the present invention; 

FIG. 23 illustrates time-locking of a path traversing two core nodes of the type 
20 illustrated in FIG. 19 in accordance with an embodiment of the present invention; 

FIG. 24 illustrates time-locking of a path traversing two core nodes of the type 
illustrated in FIG. 20 in accordance with an embodiment of the present invention; 

FIG. 25 illustrates a mesh network interconnecting core nodes having inward 
buffers in accordance with an embodiment of the present invention; 

25 FIG. 26 illustrates a mesh network interconnecting core nodes having outward 

buffers in accordance with an embodiment of the present invention; 

FIG. 27 illustrates a bimodal burst switching network as viewed from a core 
node, in accordance with an embodiment of the present invention; 
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FIG. 28 illustrates a bimodal burst switching network as viewed from an edge 
node, in accordance with an embodiment of the present invention; 

FIG. 29 illustrates a first-mode of burst transfer from an edge node to two core 

nodes; 

5 FIG. 30 illustrates the use of the first-mode of burst transfer from an edge 

node to a first core node and a second mode of burst transfer from the edge node to 
a second core node in accordance with an embodiment of the present invention; 

FIG. 31 -A illustrates the format of a first-mode communication request and 
FIG. 31 -B illustrates the fomriat of a second-mode communication request in a 
10 bimodal burst-switching network in accordance with an embodiment of the present 
invention; 

FIG. 32A illustrates a burst-schedule sent by a core node to an outbound port 
of an edge node and FIG. 32B illustrates a burst-transfer permit sent by a core node 
to an outbound port of an edge node; - . 

15 FIG. 33 illustrates a bimodal burst-switching mechanism provided at a core 

node, in accordance with an embodiment of the present invention; 

FIG. 34 is a flow chart describing the operation of the mechanism of FIG. 33; 

FIG. 35 illustrates a mechanism provided at an outbound port of the edge 
node of FIG. 15 to facilitate selective burst switching in accordance with an 
20 embodiment of the present invention; 

FIG. 36 illustrates a network comprising edge nodes and core nodes with a 
virtual-network controller connecting to one of the edge nodes in accordance with an 
embodiment of the present invention; 

FIG. 37 illustrates an exemplary flow-rate-allocation matrix for a subset of the 
25 network of FIG. 36, for use with an embodiment of the present invention; 

FIG. 38 illustrates a virtual network embedded in the parent network of FIG. 
36 In accordance with an embodiment of the present invention; 

FIG. 39 illustrates a second virtual network embedded in the parent network 
of FIG. 36; 
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FIG. 40 illustrates steps of bimodal-burst-switching control associated with a 
multiple-source connection request, in accordance with an embodiment of the 
present invention; 

FIG. 41 illustrates structures of messages sent by an edge node to an optical 
5 core node to request connections according to a first-mode of burst switching and a 
second mode of burst switching with connections of the second mode including 
nodal connections and network connections in accordance with an embodiment of 
the present invention; and 

FIG. 42A and FIG. 42B illustrate example connection requests using the 
10 format of FIG. 41. 

DETAILED DESCRIPTION 

The terminology used in describing the embodiments of the Invention is listed below. 

Edge node: A switching node having subtending information sources and sinks and 
15 connecting to other nodes is called an edge node. 

Source node: An edge node transmitting signals, received from subtending sources, 
to other nodes is called a source edge node or a source node. 

Sink node: An edge node receiving signals from other nodes, for delivery to 
subtending sinks, is called a sink edge node or a sink node. 

20 Core node: A switching node connecting only to other switching nodes, which may 
be edge nodes or core nodes, is called a core node. 

Input port: A port of a switching node receiving infomnation signals from either a 
subtending information source or from an extemal node is called an input port. 

Output port: A port of a switching node transmitting information signals to either a 
25 subtending information sink or an external node is called an output port. 

Outer port: An edge-node port receiving signals from a source, or transmitting 
signals to a sink, is called an outer port. 
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Inner port: An edge-node port receiving signals from, or transmitting signals to, 
another switching node is called an inner port. 

Ingress port: An input port of an edge node receiving information signals from 
subtending information sources is referenced as an ingress port. An ingress port is 
5 an outer port. 

Egress port: An output port of an edge node transmitting information signals to 
subtending information sinks is referenced as an egress port. An egress port is an 
outer port. 

Inbound port: An input port of an edge node receiving information signals from 
1 0 external switching nodes is referenced as an inbound port. An inbound port is an 
inner port. 

Outbound port: An output port of an edge node transmitting information signals to 
external switching nodes is referenced as an outbound port. An outbound port is an 
inner port. - r -i. 

15 Inbound channel: An inbound channel is a communication channel, usuatty a 

wavelength channel in a fiber-optic link, connecting an inbound port to an external 
node. 

Outbound channel: An outbound channel is a communication channel, usually a 
wavelength channel in a fiber-optic link, connecting an outbound port to an external 
20 node. 

Inlet port: An input port, of a core node, connecting to an outbound channel is herein 
called an inlet port. 

Outlet port: An output port, of a core node, connecting to an inbound channel is 
herein called an outlet port. 

25 Inward port: An input port, of a core node, receiving a wavelength channel from 
another core node is called an inward port. 

Outward port: An output port, of a core node, having a wavelength channel to 
another core node is called an outward port. 
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Inward and outward channels: A channel connecting a first core node to a second 
core node Is an inward channel with respect to the second core node and an 
outward channel with respect to the first core node. 

Uplink: An uplink is a communication link, usually a multiple-channel link, from an 
5 edge node to a core node. 

Downlink: A downlink is a communication link, usually a multiple-channel link, from a 
core node to an edge node. 

Up-channel: An up-channel Is a channel, usually a wavelength channel, within an 
uplink. 

10 Down-channel: A down-channel is a channel, usually a wavelength channel, within a 
downlink. 

Core link: A core link is a link connecting two core nodes. A core link is preferably a 
WDM fiber link. 

Core channel: A channel in a core link is a core channel. 

15 Upstream: The adjective 'upstream' refers to a flow in the direction from an edge 
node to a core node. 

Downstream: The adjective 'downstream' refers to a flow in the direction from a core 
node to an edge node. 

Data packet: It is a conventional data block of arbitrary size and having an identifying 
20 header. 

Data burst: A data burst is an aggregation of data packets having a burst header in 
addition to the individual packet headers; a data burst may contain only one packet 
of a large size, in which case only the burst header is required. 

Data stream: A data stream is a flow of data units having the same destination edge 
25 node and, possibly, assigned to the same route towards the destination node. 

Packet stream: A packet stream is a data stream where the data units are data 
packets generally of variable and arbitrary sizes. 
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Burst stream: A burst stream is a data stream in which data units are aggregated into 
data bursts. Where distinction is not required, the terms 'data stream', 'packet 
stream*, and 'burst stream' may be used interchangeably. 

Flow rate: The mean rate, usually in bits per second, of a data stream of any.data 
5 format is the flow rate of the data stream. 

Time-limited signal: A signal occupying a channel during a relatively short period of 
time, 16 microseconds for example, is called a time-limited signal. Successive time- 
limited signals carried by a channel may be directed to different destinations. 

Time Locking: It is a technique for time coordination using time-counters to enable 
10 time alignment of signals received at connecting nodes. A first node is time-locked to 
a second node if a signal transmitted at an instant of time indicated by a time counter 
at the first node arrives at the second node at the same instant of time as indicated 
by an identical time counter at the second node. When a first node is time-locked to 
a second node along a given path, the given path is said to be time-locked. The path 
15 may include multiple wavelength channels in which case, due to dispersion, each of 
the channels may be time locked separately. 

Wavelength router: A wavelength router is an optical device, well-known in the art, 
which has a set of WDM input ports and a set of WDM output ports, and which 
connects wavelength channels received at an input port to corresponding output 
20 ports. The connection pattern is static; typically based on a spatial cyclic mapping of 
input wavelengths to output port numbers. 

Parent networi< and embedded networks: A network divided into two or more 
independent networks is called a parent network. A parent network may be shared 
by several embedded networks each having its own control system. An embedded 
25 network, also called an independent network or a virtual network, comprises edge 
nodes, a dynamic share of the core of the parent network, and an exclusive 
configuration controller. 



Optical-Cor Network 



16040 13 

FIG. 1 illustrates a network 100 comprising electronic edge nodes 110 
interconnected by optical core nodes 120. Each edge node 110 receives data from 
subtending data sources (not illustrated) and organizes the received data into data 
blocks. The data blocks may be of equal sizes, or bursts of adaptive sizes. It is also 
5 possible that an edge node 110 organizes data in both fixed time-division- 
multiplexed (TDM) data blocks and bursts of adaptive sizes. The data blocks and 
bursts modulate an optical carrier of a predetemriined wavelength and the resulting 
modulated optical carrier is transmitted along a wavelength channel to an optical 
core node 120. The wavelength channel is time shared so that successive data 

10 blocks or successive bursts carried by the channel may have different destination 
edge nodes and, hence, may exit the core node 120 from different output ports. An 
edge node is preferably sufficiently flexible to handle a mixture of TDM and burst 
data and to provide flow-rate control for burst streams. A universal electronic 
switching node adapted to serve as an edge node in a high-capacity network with an 

15 optical core Is described in Applicant's United States patent application 10/025,982 
filed on December 26, 2001 and titled "Universial edge node", the specification of 
which is incorporated herein by reference. The universal edge node may handle a 
variety of traffic classes and may control traffic admission, connection definition, 
connection routing, and core node configuration. The provided capabilities 

20 significantly simplify network operation and control. 

The core node 120 is preferably bufferless for two reasons. Firstly, to avoid 
the need for buffer management in the core where overflow can result in data loss 
and, secondly, to enable the introduction of photonic switches in the core or the 
replacement of existing electronic core switches with photonic switches. 

25 When several edge nodes transmit optical signals to a core node 120 in a 

time-shared mode, either TDM or burst multiplexing, they must observe a strict 
timing discipline because of the absence of signal buffers at the optical switch 120. 
Such a process is preferably implemented by letting the core node 120 determine 
the required arrival time for each data block or burst. Each edge node 1 10 must then 

30 establish a precise estimation of the one-way propagation delay to each core node 
120 to which it connects. Such a process is herein called "time locking". 
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Time-shared optical core 

Time-sharing the optical-core capacity can significantly reduce the mean 
number of hops per connection, thus increasing the networl< efficiency. Time-sharing 
also enables multi-grained connections where the capacity of a channel can be 
5 divided, in different proportions, among several users. 

A time-shared channel carries successive time-limited signals that may have 
different destinations. The time-limited signals may be of equal or ariDitrary time- 
varying sizes. If the signals are of equal sizes, a TDM frame comprising a number of 
time slots may be used, with each signal occupying a time slot. A connection is 

10 allocated at least one time-slot per TDM frame and the identity of a connection can 
be derived from the time slots it occupies. When a connection is allocated two or 
more time slots per TDM frame, the allocated time slots may occupy arbitrary time 
slots in the TDM frame. To establish a multiple-time-slot connection between an 
input port and an output port of a bufferless node, such as an optical core node 120, 

15 a time-slot-matching process is required to find contemporaneous unassigned time 
slots. The connection set-up rate decreases, and the processing effort per 
connection increases, as the number of time-slots per connection increases. 

If the time-limited signals are of variable-size, connections of a wide-range of 
flow-rates can be accommodated, and the duration of the time-limited signals can be 

20 adapted to the flow rate. For example, the burst size may be 2000 bits at a flow-rate 
of 10 Megabits per second (Mb/s) and 20000 bits at a flow rate of 100 Mb/s. A 
method of computing the burst size as a function of the flow rate of a corresponding 
burst stream is described in Applicant's United States patent application 10/054,509, 
filed on November 13, 2001 and titled "Time-Coordination in a Burst-Switching 

25 Network", the specification of which is incorporated herein by reference. 

Time-sharing the optical-core capacity requires fast and scalable optical 
switches. Applicant's United States patent application 10/ 390,880 filed on March 19, 
2003 and titled "Fast optical switch" describes an optical switch in which switching is 
effected by rapid wavelength conversion. The described optical switch handles both 
30 wavelength channels and TDM optical signals but can be adapted to switch optical 
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bursts of variable sizes. A modular structure may be configured to comprise several 
fast optical switches to yield a high-speed, high-capacity, fully-connected optical 
switch. 

Circumventing the limitation of optical switching 

5 Electronic switching nodes may have random-access data buffers. This 

capability enables temporal decoupling of the operation of successive switching 
nodes, thus significantly simplifying network control. With temporal decoupling, the 
switch nodes may have independent time references. Currently, random-access 
buffers are not easily realizable in photonic switches. This limitation dictates that the 
10 edge nodes have time-locked paths to the optical core nodes. A path originating from 
an edge node having a data buffer and traversing only one bufferless node, such as 
an optical node, can always be time locked. Time-locking paths each successively 
traversing more than one optical node cannot, however, be realized network-wide. 

High-capacity edge nodes and core nodes enable the realization of rational 
15 network structures that avoid multiple hops. High capacity nodes, both electronic (at 
the edge) and optical (at the core) require high-speed schedulers. High-speed high- 
throughput schedulers can be realized as an array of pipelined scheduler modules. 
Pipelining also results in a significant reduction of contention loss, hence high 
switching-node utilization. A pipelined scheduler for a high-capacity switching node is 
20 disclosed in the aforementioned Applicant's United States Patent Application 

10/025,982 and ApplicanVs United States Patent Application 10/223,222 filed August 
20, 2002 and titled "Modular High-Capacity Switch", the specification of which is 
incorporated herein by reference. 

Decoupled Nodes 

25 FIG. 2 illustrates a prior-art network 200 comprising edge nodes 110 

interconnected by core nodes 220. Data buffers are provided at each input port and 
at each output port of each edge node and each core node. Providing such buffers 
pemilts independent time-alignment at each edge or core node and, hence, a 
complete decoupling of the nodes where independent time references may be used 

30 and temporal coordination among nodes is not required. The edge nodes interface 
with data sources and sinks (not illustrated) and are preferably based on an 
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electronic switch fabric as described above with reference to FIG. 1. A core node 
that includes input buffers and output buffers may either use an electronic switch 
fabric or an optical switch fabric. An electronic switch fabric interfacing with optical 
links requires optical-to-electrical (0-E) conversion at each input port and an 
5 electrical-to-optical (E-O) conversion at each output port. A buffer at an input port or 
an output port of an optical switch fabric can be realized using an 0-E converter, a 
data memory, and an E-0 converter 

Six core nodes 220, labeled 220A, 220B, 220C, 220U, 220V, and 220W are 
illustrated. A core node may have several subtending edge nodes 1 10. The edge 

10 nodes subtending to core node 220A are labeled core 1 10A1 , 1 10A2, etc., the edge 
nodes subtending to core node 220B are labeled 1 10B1, 1 1082, etc., and similarly 
for the remaining core nodes. At least one upstream channel 212 and at least one 
downstream channel 214 connect an edge node 110 to a core node 220. Core 
channels 216 may connect a core node 220 to selected core nodes 220. For 

1 5 example, core channel 21 6A1 connects an output port 222 of core node 220A to an 
input port 221 of core node 2208. 

Time-locked multi-hop path 

Time-locl<ing an upstream channel from an edge node to an adjacent 
bufferless core node can be realized through exchange of timing signals between a 

20 controller of the edge node and a controller of the core node as described in United 
States patent 6,570,872 issued to 8eshai et al. on May 27, 2003, and titled "Self- 
configuring distributed Switch", the specification of which is incorporated herein by 
reference, and the aforementioned applicant's United States patent application 
10/054,509. Time-locking the upstream channel enables the edge node to transfer 

25 time-limited signals carried by the upstream channel to several output ports of a 
bufferless core node. 

A path from an edge node traversing a first core node towards a second core 
node can be time locked through an exchange of timing signals between a controller 
of the edge node and a controller of the second core node. The input ports of the first 
30 core node may receive signals from several edge nodes and other core nodes. 

Likewise, the input ports of the second core node may receive signals from different 
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edge nodes and other core nodes. Consequently, except by coincidence, the edge 
node cannot time-lock to both the first and second core nodes due to the differing 
propagation delays for different node pairs. However, time-locking the edge node to 
the second core node can be consistently realized if the upstream channel from the 
5 edge node to the first core node is switched exclusively to a core channel connecting 
the first core node to the second core node. Likewise, the edge node can time lock 
to a third core node over a path comprising several concatenated channels if each 
channel, except the last one in the path, is switched exclusively to its subsequent 
channel. Time-locking can then be realized with the controller of the edge node 
10 exchanging timing messages with a controller of the third core node. 

Partial Decoupling 

FIG. 3 illustrates a network 300 comprising optical core nodes 320 with some 
core nodes 320 having subtending edge nodes. Each core node 320 is provided 
with buffers, herein called inward buffers, at selected inward ports (an inward port is 
15 an input port connecting to another core node). The core nodes of network 300 

require 0-E and E-O conversion only at the selected inward ports. There is no need 
for buffers at any inlet port or any output port. 

Each edge node 310 is preferably time-locked to its adjacent core nodes, i.e., 
the core nodes to which it subtends. Scheduling the transfer of signals across a core 
20 node 320 is enabled by time-locking each channel connecting an outbound port of 
an edge node to an inlet port of a core node 320 and time-alignment at each inward 
buffer of the core node 320 to a reference time of the core controller of the core node 
320. Time-locking the channels from outbound ports of edge nodes is feasible due to 
the availability of buffers at the outbound ports. 

25 At least one upstream channel 312 and at least one downstream channel 314 

connect an edge node 310 to a core node 320 and core channels 316 may connect 
a core node 320 to other selected core nodes 320. 

In order to switch time-limited signals received at the inward ports of a core 
node 320, the signal may be time-aligned at their respective inward ports. All signals 
30 received at inlet ports and inward ports of a core node obey a time reference set by a 
time counter associated with the core node. The inward buffers are used to realize 



16040 



the required time alignment. Buffers are not required at inlet ports because the edge 
nodes are time-locked to their adjacent core nodes. The use of inward buffers greatly 
facilitates the process of establishing a connection across the network because it 
decouples the successive channels of a path as will be described below. 
5 Additionally, the inward buffers may be exploited to facilitate the process of 

scheduling. The signals received at the inward ports may be sorted at the inward 
buffer according to their destination outlet ports towards sink edge nodes or outward 
ports towards other core nodes. 

FIG. 4 illustrates a network 400, similar to network 300, but each core node 
10 420 is provided with buffers, herein called outward buffers, only at selected outward 
ports connecting to other core nodes. Elimination of the buffers at inlet ports is 
realized by time-locking each channel connecting an outbound port of an edge node 
to an inlet port of a core node. Elimination of buffers at inward ports is realized by 
time locking a core channel connecting an outward buffer of a core node to a 
15 subsequent core node. Time locking selected core channels is feasible due to the 
availability of buffers at the selected outward ports of the core nodes. The core 
nodes of network 400 require 0-E and E-0 conversion only at the selected -outward 
ports. At least one upstream channel 412 and at least one downstream channel 414 
connect an edge node 410 to a core node 420 and core channels 416 may connect 
20 a core node 420 to other selected core nodes 420. 

The outward buffers serve the same purpose of the inward buffers of network 
300 of FIG. 3. However, the process of time alignment is different. There is a core 
channel from an outward port of a first core node to a corresponding inward port of a 
second core node. The inward ports of the second core node may also receive 

25 signals from outward ports of other geographically distributed core nodes. Each 

outward buffer in an outward port has a buffer controller that includes a time counter. 
The buffer controller exchanges timing signals (timing messages) with a controller of 
the second core node. To enable switching time-limited signals through the second 
core node, each outward port of any core node connecting to the second core node 

30 is required to time-lock to the second core node. 

To facilitate scheduling transfer of signals across two or more core nodes 420, 
the waiting signals in an outward buffer of a core node 420 may be sorted according 
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to the destined output ports of a subsequent core node 420, which include outlet 
ports delivering signals to sink edge nodes subtending to the subsequent core node 
420 and outward ports connecting to other core nodes. 

Each core node 420 has a core controller (not illustrated in FIG. 4). The core 
5 controller will be described below with reference to FIG. 17. Subtending edge nodes 
are preferably time-locked to the core nodes to which they connect as described with 
reference to FIG. 3. A buffer at an outward port comprises an optical-to-electrical (O- 
E) converter, an electronic memory, and an electrical-to-optical (E-0) converter. 

The core controller includes a master time counter and a master timing circuit 
10 (not illustrated) adapted to transmit and receive timing signals. An outward port 

provided with a buffer also includes a time-locking circuit adapted to exchange timing 
signals with a core controller of another of the core nodes. The core controller is 
adapted to receive connection requests from, and transmit connection schedules to, 
subtending edge nodes. 

15 FIG. 5 illustrates a network 500 similar to the networks of FIGs. 3, and 4 but 

hone of the core nodes is provided with input or output buffers. Buffers are available 
only at the edge nodes. Each outbound port of an edge node may be time locked to 
a core node to which it connects (i.e., each up-stream channel is time locked). 
However, none of the core nodes is time locked to any other core node (i.e., none of 

20 the core channels is time locked). A time-locked path can only be provided between 
any two edge nodes subtending to the same core node. Establishing a time-slotted 
connection over a path traversing two or more core nodes is thus infeasible due to 
the difficulty of time locking the entire path. This forces rethinking the network 
structure to reduce or eliminate the need for cascade time locking. 

25 Data bursts, or time-slotted data, originating at the edge nodes of a core node 

520 and destined to the plurality of edge nodes subtending to another core node 
may be aggregated on a single channel that terminates on a selected one of the 
destined edge nodes. The selected edge node then distributes the data to respective 
edge nodes using time-locked paths through their common core node. This process 

30 is further illustrated in FIG. 6. 

Simple Matching 
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In a simple time-slot matching process, the occupancy states of two channels 
along a path are compared during successive time slots and any time slot that is 
found to be vacant In both channels is considered an allocable time slot. Hereinafter, 
a simple time-slot-matching process is also called a first-order matching process. 
5 First-order matching has been used extensively in circuit switching and several 
techniques have been developed to increase the probability of successful matching. 
This process can be extended to handle time-intervals of variable duration. 

Compound Matching 

In a compound time-slot matching process, the occupancy states of three or 
10 more channels along a path are compared during successive time slots and any time 
slot that is found to be vacant in all the channels of the path is considered an 
allocable time slot that may be used for a connection. The order of a matching 
process is defined as the number of occupancy-state comparisons per time slot. 
Thus, a simple matching process is a first-order matching process. If successive 
15 channels along the path are separated by signal buffers, the compound matching 
process of an order G, where G is greater than unity, can be decomposed into G 
stages, each stage requiring a first-order matching process. If the channels of the 
path traverse bufferless core nodes, a compound matching process of order G 
requires concurrent time slot availability in (G + 1) ports. As described in the 
20 aforementioned Applicant's United States patent application, 1 0/223,222, the 

probability of successful compound matching decreases rapidly with the order of the 
compound matching process. The probability of successful matching becomes even 
lower with burst switching where the burst sizes are variable. 

In order to permit decomposition of a compound matching process into a 
25 series of simple (first-order) matching processes, decoupling buffers may be 
provided at either inward ports or outward ports of a bufferless core node as 
illustrated in FIGs. 3 and 4. Providing a decoupling buffer at a port of an optical core 
node requires optical-to-electrical (0-E) conversion and electrical-to-optical (E-0) 
conversion. A buffer controller is associated with each decoupling buffer in order to 
30 control timing and scheduling. 

FIG. 6 illustrates a segment of a network similar to network 300 but with two 
core nodes 320A and 320W interconnected by a core channel 316A2 without a 
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buffer at either end of the core channel. The path from core node 320A to 320W is 
not time locked and must, therefore, terminate on a single edge node. The time- 
slotted data, or burst data, originating from edge nodes 310A1. 310A2, etc., and 
destined to edge nodes 310W1 . to 310Wk, subtending to core node 320W, may be 
5 switched through core node 320W to edge node 310W2, for example. Edge node 
310W2 then distributes the time-slotted data, or burst data, destined to each of its 
neighbours through time-locked paths traversing core node 320W. It is noted that the 
example of FIG. 6 is relevant to any path that is not time locked in any of the 
networks of FIGs. 3, 4, or 5. 

10 It may be desirable to time lock an edge node 310 to a non-adjacent core 

node 320 in order to enable time-switching of time-limited signals transmitted by the 
edge node to output ports of the non-adjacent core node. For example in order to 
enable edge node 310A2 to send time-limited signals, such as time-slotted signals 
in a TDM frame, or bursts of arbitrary sizes, to edge nodes 310W1. 310W2, .... 

15 310Wk, a time locked path is required from edge node 310A2 to core node 320W. 
Such a path would be channel-switched in core node 320A, through internal channel 
625, so that core node 320W would appear as an adjacent core node to edge node 
310A2. 

FIG. 7 illustrates exemplary paths 710, 720, 730, and 740 in the network 300 
20 of FIG. 3. The illustrated path 710 connects two edge nodes 310A1 and 310A2 
subtending the same core node 320A. Core node 320A may be receiving signals 
from other subtending edge nodes 310 and, hence, edge node 310A1 must be time 
locked to core node 320A. Transferring an optical burst across core node 320A 
requires a first-order temporal-matching process. 

25 The illustrated path 720 connects an edge node 310A1 subtending to a core 

node 320A to an edge node 310B1 subtending to a core node 320B. Edge node 
310A1 is time-locked to core node 320A. A decoupling buffer 322B1 is provided at 
an inward port of core node 320B. The decoupling buffer 322B1 enables time- 
alignment at the inward ports of core node 320B so that all input ports, including inlet 

30 ports and inward ports, have a common time reference. Decoupling buffer 322B1 
also provides temporal switching. Transferring an optical burst from edge node 
310A1 to edge node 31 OBI requires two independent first-order temporal matching 
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processes. One first-order matching process is performed by a controller of core 
node 320A and the other is performed by a controller of core node 320B. Without 
the decoupling buffer 322B1 , time-alignment would be difficult to realize and, if time- 
alignment can be realized, a second-order temporal matching process would be 
5 required. 

The illustrated path 730 connects an edge node 310A1 subtending to a core 
node 320A to an edge node 310V1 subtending to a core node 320V. Edge node 
310A1 is time-locked to core node 320A. A core channel 316-A2 connects core node 
320A to core node 320W and a core channel 316-W1 connects core node 320W to 

10 core node 320V. A decoupling buffer 322W1 is provided at an inward port of core 
node 320W and a decoupling buffer 322V1 is provided at an inward port of core 
node 320V receiving a core channel 316-W1 from core node 320W. Transferring an 
optical burst from edge node 310A1 to edge node 310V1 requires three independent 
first-order temporal matching processes, one across each of the three core nodes 

15 320A. 320W, and 320V, performed by controllers of core nodes 320A, 320W. and 
320V, respectively. Without the decoupling bufferis 322W1 and 322V1, time- 
alignment would be difficult to realize and a third-order temporal matching process 
would be required. 

The illustrated path 740 connects edge node 310A1 to edge node 310V1 
20 through core nodes 320A, 320U, and 320V. Upstream channel 312-A1 from edge 
node 31 OA is switched in its entirety to a bufferless outward port of core node 320A 
connecting to core channel 316-A3 which leads to a bufferless inward port of core 
node 320U. In establishing paths 710. 720. and 730, edge node 310A1 time locks to 
its adjacent core node 320A. In establishing path 740, edge node 310A1 time-locks 
25 to core node 320U and, hence, may switch time-limited signals to several outlet ports 
of core node 320U leading to edge nodes 310 subtending to core node 320U and to 
an outward port of core node 320U. Core channel 316-U1 leading to core node 
320V terminates in an inward buffer 322V2, It is noted that edge node 310A1 may 
not, except by coincidence, be time-locked to core node 320V. The inward buffer 
30 322V2 is controlled by a core controller (not illustrated in FIG. 7) of core node 320V. 
The core controller determines a schedule for switching time-limited signals (such as 
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time-slots in a TDIVI frame or data bursts) from the inward port to output ports of core 
node 320V, including the outlet port leading to edge node 310V1 . 

FIG. 8 illustrates exemplary path 810, 820, 830, and 840 In the network 400 of 
FIG. 4. The illustrated paths bear one-to-one correspondence to paths 710, 720, 
5 730, and 740 of FIG. 7, and core channels 416 conrespond to core channels 316. 
Paths 810 and 710 are identical. 

The illustrated path 820 connects an edge node 410A1 subtending to a core 
node 420-A to an edge node 41 OBI subtending to a core node 420-B. Edge node 
410A1 Is time-locked to core node 420-A. A decoupling buffer 422A1 is provided at 

10 an outward port of core node 420-A. An outward-port controller (not illustrated) is 
operable to cause time-locking of the decoupling buffer 422A1 to core node 4208, 
thus enabling time-alignment along core channel 416A1 so that a signal arriving to 
core node 420-B along core channel 416A1 obeys a common time reference at core 
node 420-B. Decoupling buffer 422A1 also provides temporal switching. As in path 

15 720. transferring an optical burst from edge node 410A1 to edge node 41 OBI 

requires two independent first-order temporal matching processes performed by a 
controller of core node 420-A and a controller of core node 420-B. Without the 
decoupling buffer 422A1 , time-alignment would be difficult to realize and, if realized, 
a second-order temporal matching process would be required. 

20 The illustrated path 830 connects an edge node 410A1 subtending to a core 

node 420A to an edge node 410V1 subtending to a core node 420-V. A core channel 
416A2 connects core node 420-A to core node 420-W and a core channel 416W1 
connects core node 420-W to core node 420-V. A decoupling buffer 422A2 is 
provided at an outward port of core node 420-A and a decoupling buffer 422W1 is 

25 provided at an outward port of core node 420W. Transferring an optical burst from 
edge node 410A1 to edge node 410V1 requires three independent first-order 
temporal matching processes, one across each of the three core nodes 420-A, 420- 
W, and 420-V, performed by controllers of core nodes 420-A, 420-W, and 420-V, 
respectively. Without the decoupling buffers 422A2 and 422W1, time-alignment 

30 would be difficult to realize and a third-order temporal matching process would be 
required. 
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The illustrated path 840 connects edge node 410A1 to edge node 410V1 
through core nodes 420-A, 420-U, and 420-V. Time-limited signals carried by 
upstream channel 412-A1 from edge node 41 OA are switched to outward port 422A3 
of core node 320A. An outward port of core node 41 OA is provided with an outward 
5 buffer 422A3 and a buffer controller (not illustrated). Core channel 416A3 is switched 
in its entirety through core node 420-U to a bufferless output port 422U3 connecting 
to core channel 416U3 leading to a bufferless inward port of core node 420-V. 
Through exchange of timing signals with a core controller of core node 320V. 
outward buffer 422A3 time-locks to core node 420-V. Thus, time-limited signals sent 
1 0 from edge node 41 0A1 can be switched to any output port of core node 420-V, 

including the outlet port connecting to downstream channel 414-V1 leading to edge 
node410V1. 

FIG. 9 illustrates paths in a network 500 of FIG. 5. A path 910 from edge node 
510A1 to edge node 510A2 traverses a single core node 520A, a path 920 from 
15 edge node 510A1 to edge node.510B1 traverses two core nodes 520A and 520B. 
and a path from edge node 510A1 to edge node 510V1 traverses three core nodes 
520A, 520B, and 520V. In any path 910, 920, or 930. edge node 510A1 can only 
time-lock to one of the core nodes. 

Path setup 

20 The process of setting up a path 710 from edge node 310A1 to edge node 

310A2 comprises steps of: 

(1) time-locking edge node 310A1 to core node 320A, and 

(2) allocating a time-interval in upstream wavelength channel 312A1 
connecting edge node 310A1 to core node 320A and in downstream wavelength 

25 channel 314A1 connecting core node 320A to edge node 310A2. 

The process of setting up a path 720 from edge node 310A1 to edge node 
310B1 includes steps of: 

(1) time-locking edge node 310A1 to core node 320A, 

(2) allocating a first time-interval in two wavelength channels 312A1 and 

30 316A1 . where upstream wavelength channel 312A1 connects edge node 310A1 to 
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core node 320A and core wavelength channel 316A1 connects core node 320A to 
core node 320B, 

(3) independently allocating a second time-interval in downstream wavelength 
channel 314A2 connecting core node 320B to edge node 310B1 , and 

5 (4) time-switching the first time-interval to the second time interval at 

decoupling buffer 322B1 placed at the inward port, of core node 320B. which 
receives core wavelength channel 316A1. 

The process of setting up a path 730 from edge node 310A1 to edge node 
310V1 comprises steps of: 

10 (1 ) time-locking edge node 310A1 to core node 320A, 

(2) allocating a first time-interval in two wavelength channels 312A1 and 
316A2, where upstream wavelength channel 312A1 connects edge node 310A1 to 
core node 320A and core wavelength channel 316A2 connects core node 320A to 
core node 320W, 

15 (3) independently allocating a second time-interval in core wavelength 

channels 316W1 connecting core node 320W to core node 320V. 

(4) independently allocating a third time-interval in downstream wavelength 
channel 314V1 connecting core node 320V to edge node 310V1, 

(5) time-switching the first time-interval to the second time interval at 
20 decoupling buffer 322W1 placed at the inward port, of core node 320W, which 

receives core wavelength channel 316A2, and 

(6) time-switching the second time-interval to the third time interval at 
decoupling buffer 322V1 placed at the inward port, of core node 320V. which 
receives core wavelength channel 316W1 . 

25 The process of setting up a path 740 from edge node 310A1 to edge node 

310V1 is quite similar to the process of setting up a path 720. The main difference is 
that a wavelength channel from core node 320A is switched in its entirety, through 
core node 320U, to core node 320V, thus core node 320V becomes adjacent to core 
node 320A and path 740 effectively reduces to path 720. 
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The process of setting up a path 820 from edge node 410A1 to edge node 
410B1 comprises steps of: 

(1 ) time-locking edge node 410A1 to core node 420A, 

(2) allocating a first time-interval in upstream wavelength channel 412A1 
5 connecting edge node 410A1 to core node 420A, 

(3) allocating a second time-interval in two wavelength channels, 416A1 and 
414B1 , core channel 416A1 connecting core node 420A to core node 420B and 
downstream channel 414B1 connecting core node 420B to edge node 410B1, 

(4) time-locking decoupling buffer 422A1 , placed at the outward port of core 
10 node 420A from which channel 41 6A1 originates, to core node 420B, and 

(5) time-switching the first time-interval to the second time interval at 
decoupling buffer 422A1 . 

The process of setting up a path 830 from edge node 41 0A1 to edge node 
4 10V1 includes steps of: 

15 (1 ) time-locking edge node 41 0A1 to core node 420A, 

(2) allocating a first time-interval in upstream wavelength channel 412A1 
connecting edge node 410A1 to core node 420A, 

(4) independently allocating a second time interval in core wavelength 
channel 416A2 connecting core node 420A to core node 420W, 

20 (3) independently allocating a third time-interval in two wavelength channels, 

416W1 and 414V1, core channel 416W1 connecting core node 420W to core node 
420V and downstream channel 414V1 connecting core node 420V to edge node 
410V1, 

(5) time-locking decoupling buffer 422A2, placed at the outward port of core 
25 node 420A from which channel 416A2 originates, to core node 420W, 

(6) time-locking decoupling buffer 422W1, placed at the outward port of core 
node 420W from which channel 416W1 originates, to core node 420V, 

(7) time-switching the first time-interval to the second time interval at 
decoupling buffer 422A2, and 
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(8) time-switching the second time-interval to the third time interval at 
decoupling buffer 422W1. 

The process of setting up a path 840 from edge node 410A1 to edge node 
410V1 is quite similar to the process of setting up a path 820. The main difference is 
5 that a wavelength channel from core node 420A is switched in its entirety, through 
core node 420U. to core node 320V, thus core node 420V becomes adjacent to core 
node 420A and path 840 effectively reduces to path 820. 

A time-locked wavelength channel from edge node 510A1 to non-adjacent 
core node 520B may be realized by channel-switching through core node 520A and 

1 0 timing-signal exchange between a controller of edge node 51 0A1 and a core 
controller of core node 520B through upstream channel 512A1 and core channel 
516A2. Likewise, a time locked wavelength channel from edge node 510A1 to core 
node 520W is realized by channel switching through core node 520A and timing- 
signal exchange between the edge controller of edge node 510A1 and the core 

1 5 controller of core node 520W through upstream channel 51 2A1 and core channel 
516A3. A time-locked wavelength channel from edge node 510A1 to core node 520V 
is realized by channel switching through core nodes 520A and 520W and timing- 
signals exchange between the edge controller of edge node 510A1 and the core 
controller of core node 520V through upstream channel 512A1 and core channels 

20 516A3and516W1. 

FIG. 10 Illustrates a mechanism for time-locking an edge node 1010 to a core 
node 1020. There Is at least one channel 1012 from the edge node 1010 to core 
node 1020 and at least one channel 1014 in the opposite direction. A controller 1050 
of a core node 1020 includes a time counter of a predefined period and wordlength. 

25 A wavelength channel 1012 connects an outbound port of edge node 1 01 0 to an 
inlet port of core node 1020. The outbound port includes a time counter having the 
same period and wordlength of the time counter of the core controller 1050. An 
exchange of time-counter readings enables edge node 1010 to time lock to the core 
node 1020. Channel 1012 is then said to be time locked. The outbound ports of the 

30 edge node 1010 may connect to different core nodes 1020 and each outbound port 
may have its own time counter. The technique of time locking is described in the 
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aforementioned United States patent 6,570,872. Time locking, as described in the 
above US patent, allows a node having a buffer to time lock to another node. 

Control signals from an edge node to a bufferless core node are transmitted 
over a time-locked path. When the time-locked path carries time-limited signals such 
5 as bursts or signals confined in time-slots in a TDM frame, control signals are 
interleaved with payload signals. A method of establishing and maintaining time 
locking in a time-shared channel is described in the aforementioned United States 
patent application 10/054,509, filed on November 13. 2001 . 

Burst-Switching IModes 

1 0 Consider a first node sending burst streams to a second node, each burst 

stream comprising a succession of data bursts. The first node may form data bursts 
then send a descriptor of each data burst to the second node for scheduling the 
data-burst transfer to specified output ports of the second node. The second node 
may receive signals from other nodes and, hence, scheduling the transfer of signals 

15 apross the second node is required to avoid collision at its output ports. Scheduling 
information must be sent back to the first node if the second node is not provided 
with signal buffers at its input. According to a first mode of burst transfer (also called 
mode-0), the first node may choose to form bursts then send a descriptor of each 
burst to the second node, the descriptor including a burst size and a burst 

20 destination. The second node responds by sending an indication of the time at which 
the burst must be received at the second node. Alternatively, according to a second 
mode of burst transfer (also called mode-1 ), the first node may choose to send a 
flow-rate-allocation request for a burst stream to the second node and the second 
node may then send back burst pemiits, each burst permit including a burst-size 

25 upper bound, herein also called a nominal size, and an instant of time at which the 
burst must be received at the second node. The first node may choose to use the 
first mode for some burst streams and the second mode for other burst streams. 

Each Data packet received at an input port of an edge node is switched to an 
output port selected according to the packet's destination. An output port can be an 
30 egress port connecting to subtending data sinks or an outbound port connecting to 
one of the core nodes. An outbound port connects to a specific core node and the 
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destination of the packet determines an output port of the specific core node. The 
packets received at input ports of an edge nodes may be grouped into data streams 
defined according to their destination sink node. A destination sink node, in tum, is 
reached through at least one output port of a core node. The packet streams defined 
5 at input may be rate regulated. At an outbound port of the edge node, data packets 
received from several input ports may be grouped into data bursts where all 
constituent packets of a data burst are directed to the same output port of the core 
node to which the outbound port connects. At the outbound port of the edge node, 
data bursts may be grouped into two categories according to the desired switching 
10 mode. In the first switching mode, a request to schedule a burst is sent to a core 
node only after the burst is formed. In the second mode, the core node to which the 
outbound port is connected sends burst-transfer permits to the edge node. 
Messages from a core node to inbound ports of edge nodes are carried by 
downlinks. 

15 

FIG. 1 1 illustrates a composite-star network comprising a plurality of edge 
nodes 1110 interconnected by a plurality of independent core nodes 1 120, two of 
which are illustrated. The core nodes 1 120 are bufferless and may comprise 
electronic space switches or optical space switches. Controllers of edge nodes 1110 
20 communicate with controllers of core nodes 1 120 to enable time locking the edge 
nodes to their adjacent core nodes. 

An upstream channel 1112 connecting an edge node 1 1 10-A to a specific 
core node, 1120-Xfor example, may be a channel in an upstream link connecting 
the edge node 1 1 10-A to the core node 1 120-X or a channel in a link from the edge 
25 node 1 1 10-A that is switched through another core node 1 120 to the specific core 
node 1 120-X. Likewise, a downstream channel 1114 from a core node 1 120 to an 
edge node 1 11 OA may be a direct channel or a channel switched through another 
core node 1120. 

An edge node 1110 may connect to selected core nodes 1 120. The round- 
30 trip propagation delays between the edge node 1 1 10 and the individual core nodes 
1 120 may vary significantly. An outbound-port controller (not illustrated) of the edge 
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node 1110 may determine the burst-switching mode to be used with respect to each 
core node 1 120. The edge node 1110 has at least one outbound port linked to each 
of the selected core node 1 120, and the burst formation at each outbound port 
depends on the mode of burst switching used. The selection of the burst switching 
5 mode may be based on traffic classes. An outbound port may use the first mode for 
delay-tolerant traffic and the second mode for delay-sensitive traffic. The mode 
selection may also be based on the proximity of the core node to the edge node as 
will be described below. 

To facilitate time-division-multiplexed (TDM) sharing of channel capacity 
10 among several connections, a conventional slotted TDM frame may be used (FIG. 
12A). A connection may be allocated several time slots per TDM frame as illustrated 
in FIG. 12B where a connection may have one to four time slots. At least one time 
slot per TDM frame, herein called a control time slot, may be used for 
communicating control signals. FIG. 12C illustrates a time-sharing scheme where 
15 successive bursts, possibly belonging to different data streams, are transmitted 
along a time-shared channel connecting a first node to a second node. A control 
burst, containing control data from one node to another, is inserted at arbitrary time 
intervals. FIG. 12C illustrates the propagation of bursts labeled 90 to 95 from the 
first node to the second with a control burst containing transfer requests for bursts 
20 labeled 1 60 to 184 interleaving the data bursts. 

FIG. 12D illustrates data transfer using both a burst mode and a TDM mode. 
Time is organized in time frames 1210 (1210-1 to 1210-6 in this example). Each time 
frame 1210 is dynamically divided between a sub-frame1220 to be used for TDM 
and a sub-frame 1230 to be used for burst transfer. The boundaries between sub- 

25 frames 1 220 and 1 230 are flexible. The duration T of the time frame 1 21 0 is selected 
to be large enough to allow a contiguous period of time to schedule bursts of variable 
sizes, and short enough to render the delay for a TDM connection using one time 
slot per frame 1210 acceptable. Selecting the duration T to be 1 millisecond and a 
time-slot duration A for the TDM sub-frame as one microsecond, and limiting the size 

30 of each data burst so that the burst duration on a shared channel does not exceed 
1 0 microseconds, for example, then a sub-frame 1220 of 250 time slots, for example, 
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would have a duration of 250 microseconds, thus leaving a contiguous interval of 
750 microseconds for sub-frame 1230. An interval of 750 microseconds would 
accommodate a minimum of 75 data bursts, each limited to a duration of 10 
microseconds. It is noted that at channel capacity of 10 Gb/s, a burst of 10 
5 microseconds includes 12500 octets (1 00,000 bits), which is much larger than a 
typical data packet and a data burst may then contain numerous data packets. 

A scheduler handling interleaved TDM-burst intervals is required to ensure 
that a burst is completely contained in a sub-frame 1230. This may lead to a slight 
waste, less than the maximum duration of a burst (10 microseconds in the above 
1 0 example) per sub-frame 1 230. The waste can be further reduced by scheduling 

control bursts, which typically have relatively short duration at the end of a sub-frame 
1230. 

In order to realize high scheduling efficiency, temporal-packing techniques, 
well-known in the art, may be employed. For example, time-slots for the TDM sub- 

15 frame 1220 may be attempted starting from a fixed position, proceeding across the 
sub-frame 1220 in a predetermined order. To increase the contiguous time-interval 
to be used for sub-frame1230, the fixed position should be the start or end of a sub- 
frame 1220, and the predetermined order should be a fonA^ard sequential order, 
when the fixed position is the start of sub-frame 1220 or in a backward sequential 

20 order when the fixed position is the end of sub-frame 1220. It is noted that, during 
time-frame 1210, the scheduling process may perform temporal-matching processes 
of input and output ports of core nodes over successive (future) time frames 1210. 

FIG. 13 illustrates burst scheduling in a space switch having five input ports 
labeled XO. XI, X2, X3. and X4 and five output ports labeled YO. Y1, Y2, Y3, and Y4. 

25 The figure illustrates a burst 1312-1 transferred from input port XO to output port Y1 
and a burst 1312-9 transferred from input port X2 to output port YO. The scheduling 
process preferably attempts to reduce unused time intervals between successive 
bursts. A burst scheduler that attempts to reduce the idle time between successive 
bursts is described in the aforementioned Applicant's United States patent 

30 application 09/750.071 , filed on December 29, 2000 and titled "Burst Switching in a 
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High-Capacity Network", the specification of which is incorporated herein by 
reference 

FIG. 14 illustrates a burst-scheduling process in a space switch having five 
input ports and five output ports using time frames each including a burst sub-frame 
5 and a TDM sub-frame as described with reference to FIG. 12D. Exemplary bursts 
transferred from input to output are likewise labeled in FIG. 14. The scheduler 
described in the aforementioned patent application 09/750,071 can be adapted to 
schedule interleaved burst-TDM switching by treating the TDM frames as pre- 
scheduled bursts. 

10 Bimodal Edge Node 

FIG. 15 illustrates a configuration of an exemplary edge node 1510 having 
nine input ports and nine output ports. An additional control input port and an 
additional control output port may also be provided. The input ports are divided into 
ingress ports 1514 receiving signals from subtending sources (not illustrated) and 

15 inbound ports 1516 receiving isignals from external nodes. The output ports are 

divided into egress ports 1524 delivering signals to subtending sinks (not illustrated) 
and outbound ports 1526 sending signals to external nodes. An external node may 
be a core node or another edge node. An ingress port 1514 may be adapted to 
receive data packets or data bursts from subtending sources and an egress port 

20 1 524 may be adapted to transmit data packets or data bursts to subtending sinks. A 
data burst may include several packets. An inbound port 1516 is adapted to receive 
data bursts from external nodes and parse each burst into separate constituent data 
packets that may be directed to different egress ports 1524 and outbound ports 
1526. An outbound port 1526 is adapted to receive packets from ingress ports 1514 

25 and inbound ports 1516 through the switching fabric1540 and assemble packets of 
the same destination into data bursts. An edge node 1510 preferably switches 
signals from ingress ports to egress ports and outbound ports and from inbound 
ports to egress ports. However, it may be desirable that an edge node 1510 also 
perform a tandem-switching function and transfer data from inbound ports 1516 to 

30 outbound ports 1 526. A controller 1 550 of edge node 1 51 0 receives control signals 
from input ports either through a data bus connecting the input ports or through the 
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switching fabric in several ways well known in the art. Likewise, the edge node 
controller 1550 communicates with output ports through a bus or through the 
switching fabric. Controller 1550 may communicate with the input and output ports 
through the switching fabric 1540, a control input port 151 1 , and a control output port 
5 1 522 as illustrated. Control signals from the edge controller 1 550 are sent to output 
ports of the edge node through control input port 151 1 and the switch fabric 1540. 
Control signals received at the input ports of the edge node are sent to the edge 
controller 1550 through the switch fabric1540 and the control output port 1522. 

At least one outbound port of edge node 1510 has a burst-formation device 
10 that aggregates data packets into data bursts according to one of the burst-transfer 
modes described above. The burst-formation device will be described below with 
reference to FIG. 35. 

Each core node is provided with a core controller. The controller receives 
control signals from incoming links and transmits control signals through outgoing 

15 links. A link may carry signals in several spectral bands, often referenced as 

wavelength channels. With photonic switching, baseband data may not be detected 
from incoming optical signals and the controller has to rely on either of two methods 
of acquiring the control information. In a first method of communicating control 
information, the control signals are carried in a dedicated spectral band (dedicated 

20 wavelength channel) and the method is often referenced as an 'out-of-band-control' 
method. Dedicated spectral bands would be used within an upstream fiber link from 
an edge node to a core node, a downstream fiber link from the core node to the edge 
node, and possibly a core link connecting core nodes. In a second method of 
communicating control information, the control signals are time-interleaved with 

25 payload signals received from and transmitted to an edge node. The signals are 
retrieved by temporal identification of their relative position in a time frame. This 
process requires that the edge node be time-locked to the core node. 

FIG. 16 illustrates a core node 1620 adapted for out-of-band control. The core 
node comprises an optical switch fabric 1640. An input fiber link 1612 is 
30 demultiplexed at a demultiplexer 1618 into component wavelength channels. At the 
output of the optical switch fabrici 640, groups of output payload wavelength 
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channels 1623 are received by multiplexers 1688 where the output payload 
wavelength channels 1623 are multiplexed into output fiber links 1632. 

The core node includes a core controller 1650 for controlling the operation of 
the optical switch fabric 1640. To communicate with controllers of the electronic edge 
5 nodes and the core node, each input fiber link 1612 and output fiber link 1632 may 
include a control wavelength channel dedicated to carry control signals. An input 
control wavelength channel 1622 is sent to the core controller 1650 through an 
optlcal-to-electrical (0-E) interface 1602. Conversely, the core controller 1650 sends 
an output control wavelength channel 1624, through an electrical-to-optical (E-O) 
10 Interface 1604, to each multiplexer 1688 to be included in the signal placed on the 
output fiber links 1632. 

The core controller 1650 includes an admission controller 1625 that receives 
connection requests from edge nodes for the establishment of connections through 
the core node to other edge nodes. The admission controller 1625 may also receive 

1 5 requests from edge nodes for increments or decrements in the capacity of previously 
established connections, or the complete removal of previously established 
connections. Based on a record of the occupancy of the input ports and output ports 
of the optical switch fabrici 640, the admission controller 1 625 may accept or reject a 
given request for a capacity increment. A slave controller 1630 connected to core 

20 controller 1650 periodically receives from core controller 1650 a connection schedule 
for each input port and configures the switch fabric 1640 accordingly. The connection 
schedule specifies time instants at which each input port establishes a connection to 
a corresponding output port. 

FIG. 17 presents a second core node 1720 adapted for in-band control. In- 
25 band control time multiplexes control signals with payload signals as described 

above. When In-band control is used, the transmitting device is required to time-lock 
to the receiving device. In-band control may be realized only when the input 
channels and output channels of the core node are time-shared either in a TDM 
mode or in a burst-transfer mode. 
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The core node 1720 comprises an optical switch fabric 1740. An input fiber 
link 1712 is demultiplexed at a demultiplexer 1718 into component wavelength 
channels. At the output of the optical switch 1740, groups of output payload 
wavelength channels 1723 are received by multiplexers 1788 where the output 
5 payload wavelength channels 1723 are multiplexed into output fiber links 1732. The 
core node 1720 includes a core controller 1750 for controlling the operation and 
connectivity of the optical switch fabric1740. Core controller 1750 has a channel from 
the switch fabric 1740 through an 0-E interface 1702 and a channel to the switch 
fabric 1740, through an E-0 interface 1704. Core controller 1750 has an internal 

10 connection to a slave controller 1730 which configures the connectivity of the switch 
fabric 1740 according to instructions received from the core controller 1750. At least 
one wavelength channel within each input fiber link 1712 may be time shared. The 
core controller 1750 of FIG. 17 includes an admission controller 1725. Core 
controller 1750 periodically sends a connection schedule to slave controller 1730 

15 which configures the switch fabric 1740 accordingly. The connection schedule 
specifies time instants at which each input port establishes a connection to a 
corresponding output port. 

The optical switch fabric 1640 or 1740 may comprise a single switching plane 
or parallel switching planes. If a single switching plane is used, wavelength 
20 converters may be needed at input or at output. 

Each of the control functions of the core controller 1650 or core controller 
1750 can be implemented in application-specific hardware, which is the preferred 
implementation when high speed is a requirement. However, in an alternative 
implementation, the core controller 1650 or 1750 may be loaded with controlling 
25 software for executing methods exemplary of this invention. 

FIG. 18 illustrates a prior-art core node 1820 (corresponding to core node 220 
of FIG. 2) having buffers at each inlet port 1814, inward port 1816, outlet port 1824, 
and outward port 1826. A core controller 1850 receives control signals from inlet 
ports 1814 and inward ports 1816 through the switch fabric 1840 and output control 
30 port 1822, The core controller 1850 distributes control signals to outlet ports 1824 
and outward ports 1826 through input control port 1812 and the switch fabric1840. 
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Time coordination with subtending edge nodes and other core nodes 1820 is not 
required due to the availability of buffers which permit both time alignment and 
temporal decoupling. This arrangement is suitable if the switch fabric 1840 Is 
electronic. With an optical switch fabric, each Input buffer would require optical-to- 
5 electrical (0-E) conversion and electrical-to-optical (E-0) conversion, and each 
output buffer requires O-E and E-O conversion (not illustrated). 

FIG. 19 illustrates a core node 1920 (conresponding to core node 320) having 
buffers at selected inward ports 1917. Some inward ports, referenced as 1916, may 
be bufferless and all other ports are bufferless. The buffers at the selected inward 

10 ports have control paths 1980 to core controller 1950. Control paths 1980 may share 
a common internal channel (not illustrated). The control paths are necessary for 
timing the transfer of signals across the switch fabric 1940. A core controller 1950 
receives control signals from inlet ports 1914 and inward ports 1917 through the 
switch fabric 1940 and output control port 1922. The core controller 1950 distributes 

15 control signals to outlet ports1924 and outward ports 1926 through input control port * 
1912 and the switch fabric 1940. Inlet channels, originating from subtending edge 
nodes and terminating on inlet ports, can be time-locked due to the availability of 
buffers at the electronic edge nodes. Inward channels arriving from other core nodes 
1920 to inward ports 1917 that are provided with buffers need not be time locked. 

20 Inward channels arriving at an inward port 1916 that is not provided with a buffer is 
switched in its entirety to either an outward channel 1926 leading to another core 
node or to an outlet channel 1924 leading to a subtending edge node. 

The core controller 1 950 includes a master time counter and a master timing 
circuit (not illustrated) adapted to transmit and receive timing signals. The master 
25 timing circuit distributes a time reference to inward ports 1917. The core controller 
1950 is adapted to receive connection requests from, and transmit connection 
schedules to, subtending edge nodes. The core controller 1950 further includes a 
scheduling mechanism for computing the connection schedules as will be described 
with reference to FIG. 33. 

30 The core controller 1950 may receive burst-transfer requests and compute 

conresponding burst-transfer schedules. The core controller 1950 may also receive 
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flow-rate-allocation requests and schedule a corresponding number of time slots per 
time-division-multiplexed frame. Alternatively the core controller 1950 may compute 
burst-transfer permits based on the flow-rate-allocation requests. The core controller 
1950 may further be adapted to allocate an entire channel to an outward port leading 
5 to a selected edge node subtending to another core node. 

FIG. 20 illustrates a core node 2020 (corresponding to a core node 420) 
having buffers at selected outward ports 2027. Some outward ports, referenced as 
2026, may be bufferless and all other ports are bufferless. A core controller 2050, 
similar to core controller 1950, receives control signals from inlet ports 2014 and 

10 inward ports 2016 through the switch fabric 2040 and output control port 2022. The 
core controller 2050 distributes control signals to outlet ports 2024 and outward ports 
2026/2027 through input control port 2012 and the switch fabric 2040. Inlet channels, 
originating from subtending edge nodes and terminating on inlet ports, can be time- 
locked due to the availability of buffers at the electronic edge nodes. An inward 

15 channel arriving from an outward port 2027 of another core node, where the outward 
.: port 2027 has a buffer, can be time locked to the core node 2020. An inward 
channel arriving to a core node 2020A from a bufferiess outward port 2026 of 
another core node is switched in its entirety to either an outward channel connecting 
to outward port 2026 of core node 2020A leading to another core node or to an outlet 

20 channel connecting to an outlet port 2024 leading to a subtending edge node. 

FIG. 21 A illustrates signal switching within an optical switch fabric 1940X from 
an inward port 1917, provided with a buffer, to a plurality of outlet ports 1924 (1924-1 
to 1924-k) and outward ports 1926. A port 1917 may receive an inward channel 
carrying time-limited signals destined to edge nodes 1510 (FIG. 15) subtending to 

25 optical switch fabric 1940X and to other core nodes. The edge nodes subtending to 
switch fabric 1940X are labeled 1510X1 to 1510Xk. A signal destined to an edge 
node 1510 is switched to an outlet port 1924 connecting to the edge node. An edge 
node connects to a core node through at least one inlet port and at least one outlet 
port. A signal destined to another core node is switched from inward port 1917 to a 

30 respective outward port of switch fabric 1 940. 

FIG. 21 B illustrates signal switching from a bufferless inward port 1916 to a 
plurality of outlet ports 1924 (1924-1 , 1924-2, etc.) in an optical switching fabric 
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1940Y. An inward port 1916 may receive an inward channel carrying time-limited 
signals to be processed by a subsequent core node (not illustrated) or to subtending 
edge nodes 1510. Due to the absence of a buffer at the inward port, the inward 
channel must be switched in its entirety to either an outward port leading to another 
5 core node or to an outlet port leading to a single subtending edge node, such as 
edge-node 1510Y2 in the illustrated example. Edge node 1510Y2, which is equipped 
with buffers, may then switch the time-limited signals destined to neighboring edge 
nodes1510Y1. 1510Y3, .... 1510Yk. through the switch fabric 1940Y. Thus, the 
arrangement of FIG. 21 B requires that a portion of the signals destined to the 
10 subtending edge nodes be switched twice through the switch fabric, first along a path 
2150 from the inward port to one of the edge nodes 1510, then along path 2160 from 
an inlet port to respective outlet ports. 

To realize high-capacity, some of the core nodes may comprise a multi-plane 
switch having two or more switch planes each.plane being an optical switch fabric. 

15 FIG. 22 illustrates two core nodes 2220A and 2220B. A core link 2250 between the 
two core nodes 2220A and 2220B may be a wavelength-division-multiplexed (WDM) 
fiber-optic link. Each of the two core nodes in this example has four switch planes 
2240 (2240A1 to 2240A4 and 2240B1 to 2240B4). An edge node may connect to a 
multi-plane core node by several upstream channels and several downstream 

20 channels. The switch planes of a core node need not be identically configured. In the 
illustrated example, two switch planes, 2240A1 and 2240A2, have no buffers while 
switch planes 2240A3 and 2240A4 of core node 2220A have the configuration of 
FIG. 19 where some inward ports are provided with buffers. Core node 2220B is 
similariy configured with two switch planes 2240B1 and 2240B2 having the 

25 configuration of FIG. 19 and the other two switch planes, 2240B3 and 2240B4, are 
bufferiess. Two core channels 2252 connect the bufferiess switch fabrics 2240A1 
and 2240A2 of core node 2220A to the two switch fabrics 2240B1 and 2240B2 of 
core node 2220B that have inward buffers as illustrated. Two core channels, 2254, 
connect switch fabrics 2240A3 and 2240A4 to the two bufferiess switch fabrics 

30 2240B3 and 2240B3. A path along a core channel 2252 may carry time-limited 
signals to be switched in a switch plane 2240B1 or 2240B2 in a manner similar to 
that described above with reference to FIG. 21 A. A path along a core channel 2254 
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may carry time-limited signals to be switched in a switch plane 2208B3 or 2208B4, 
with some signals possibly double switched through switch plane 2240B3 or 2240B4 
as described with reference to FIG. 21 B. 

FIG. 23 illustrates a path from an edge node 151 OX, adjacent to a core node 
5 1920A, to a core node 1920U.The path comprises an upstream channel 1512X and 
a core channel 1916A4 that connects a bufferiess outward port of core node 1920A 
to a bufferiess inward port of core node 1920U. The upstream channel 1512X is 
exclusively switched to the core channel 1916A4 through core node 1920A. In order 
to establish the internal path through core node 1920A, edge node 151 Ox must be 
1 0 able to communicate with the controller 1 950A of core node 1 920A. Furthermore, in 
order to establish switched paths from edge node 151 OX to output ports of core node 
1920U, which includes internal paths 2380, edge node 151 OX must be time-locked to 
core node 1950U. 

The edge node 151 OX may have out-of-band control channels (not illustrated) 
15 to and from the controller 1 950A of adjacent core node 1 920A, in which case the 
edge node 151 OX can send a connection request to core controller 1950A to 
establish the internal connection across core node 1920A. Out-of-band control 
requires that each edge node dedicate a control channel to each adjacent core node. 

If the edge node 151 OX is to transmit in-band time-interleaved control signals 
20 to core controller 1950A, it must apply a time-locking process as described earlier 
with reference to FIG. 10. Time-locking edge node 151 OX to core node 1920A 
precludes time locking the edge node 151 OX to core node 1920U. However, if edge 
node 151 OX has at least two upstream channels 1512 to core node 1920A, then at 
least one of the upstream channels may time-lock to core node 1920A in order to 
25 enable edge node 151 OX to exchange control signals with core controller 1950A, 
and at least one of the remaining upstream channels from edge node 1 51 OX to core 
node 1920A may be channel switched (i.e., exclusively switched) through intemal 
connection 2340 to core channel 1916A4 leading to an inward port of core node 
1920U. Core node 1920U would then become effectively adjacent to edge node 
30 151 OX, until the path from the edge node 151 OX to core node 1920U is terminated, 
and the aforementioned time-locking process can be applied with an exchange of 
timing messages between the edge controller of edge node 151 OX and core 



16040 40 

controller 1950U. In FIG. 23, upstream channel 1512X* is time-locked to core node 
1920A. Hence time-limited signals carried by upstream channel 1512X* can be 
switched to core controller 1950-A through internal path 2341 and through internal 
paths 2343 to core channel 1916A5 leading to a buffered input port of core node 
5 1920U from which the time-limited signals may be switched to several output ports of 
core node 1920U. Core controller 1950-A sends signals to edge node 151 OX through 
switched Internal path 2342. The output ports of an edge node may time-lock to 
different core nodes. 

FIG. 24 illustrates a time-locked path from an edge node 1510Y to a core 
10 node 2020U.The path comprises an upstream channel 1512Y and a core channel 
2016A4. An upstream channel 1512Y* is time-locked to core node 2020Aand in- 
band signaling is used in a manner similar to that described with reference to FIG. 
23. Time-limited signals carried by upstream channel 1512Y* are switched to core 
controller 2050A through intemal path 2441 and. core controller 2050A sends signals 
15 to edge node 1510Y through switched internal path 2442. Time-limited signals 

carried by upstream channel 1512Y* are switched to a buffered outward port 2027A 
of core node 2020A. The buffered outward port 2027A can time-lock to core node 
2020U and, hence, can switch the time-limited signals it receives from the plurality of 
input ports of core node 2020A to several output ports of core node 2020U. 

20 Connectivity 

The number of adjacent core nodes to an edge node is determined by the 
number of upstream links of the edge node. For example, if each edge node has 1 28 
outbound ports, each port supporting one wavelength channel, and using WDM links 
each carrying a multiplex of 32 wavelength channels, then an edge node can directly 

25 access a maximum of four core nodes using four WDM upstream links and four 

WDM downstream links. Thus, the edge node would have four adjacent core nodes. 
Each core node preferably has a number of parallel switch planes equal to the 
number of wavelength channels per WDM link; 32 in this example. To form a high- 
performance composite-star network, each edge node connects to each core node. 

30 The number of edge nodes would be equal to the number of dual ports per switch 
plane (a dual port comprises an input port and an output port). The entire network 
would then have only four core nodes, which limits the geographic coverage of the 
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network. Increasing the number of core nodes would require that each WDM link 
carry fewer wavelength channels. In order to increase the number of core nodes 
while exploiting WDM economy, wavelength routers, well known in the art, may be 
used to direct wavelength channels of WDM links, originating from several edge 
5 nodes, to several core nodes. 

MESH NETWORKS 

FIG. 25 illustrates a meshed network 2500 comprising optical core nodes 
1920 (1920A, 1920B, 1920C. and 1920D). Each core node comprises an optical 
switch fabric 1940 (FIG. 19, not illustrated in FIG. 25), a plurality of inward ports 

10 1916/1917 receiving optical signals from other core nodes, and a plurality of outward 
ports 1926 transmitting optical signals to other core nodes. At least one outward port 
1926 of each core node 1920 has a core channel connecting to an inward port 1917 
of another core node 1920. At least one of the inward ports 1917 has a buffer 
controller (not illustrated) for time aligning signals received at the inward buffer. Each 

15 core node 1920 includes a core controller 1950 (FIG. 19, not illustrated in FIG. 25) 
operable to communicate with core controllers of other core nodes to exchange 
connection-scheduling data. A core node 1 920 may serve as a transit node for 
other core-node pairs. A core node may also have a plurality of inlet ports 1914 
receiving optical signals from subtending edge nodes and a plurality of outlet ports 

20 1924 transmitting optical signals to subtending edge nodes. 

A connection from illustrated source edge node 151 OX subtending to core 
node 1920A to a destination edge node 1510Y subtending to core node 1920C may 
be routed directly along a core channel from an outward port 1926-1 of core node 
1920A to an inward port 1917 of core node 1920C. A connection from source edge 
25 node 151 OX to destination edge node 1510Y may also be routed through core node 
1920B or core node 1920D. A connection through outward port 1926-2 and core 
node 1920D is illustrated in FIG. 25. 

The inward ports 1917 of the second core node 1920C may receive 
uncoordinated signals from other core nodes 1920 and, hence, the received signals 
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may require time alignment. Thus, each received signal may have to be delayed 
before being switched through the bufferless switch fabric of core node 1920C. 

In a full mesh structure having N > 1 core nodes, where each core node 
connects to each other core node, there may be (N-1 ) indirect paths from any source 
5 edge node subtending to a given core node to any destination edge node subtending 
to another core node. An indirect path may carry time-multiplexed signals from 
several edge nodes if each inward port it traverses is provided with a buffer. 

FIG. 26 illustrates a meshed network 2600, similar to network 2500, 
comprising optical core nodes 2020 (2020A, 20920B, 2020C, and 2020D). Each core 

10 node 2020 comprises an optical switch fabric 2040 (FIG. 20, not illustrated in FIG. 
26), a plurality of inward ports 2016 receiving optical signals from other core nodes, 
and a plurality of outward ports 2026/2027 transmitting optical signals to other core 
nodes. At least one outward port of each core node 2020 has a core channel 
connecting to an inward port of another core node. At least one of the outward ports 

15 2027 has a buffer controller for time locking the buffer to another core node. 

Each core node includes a core controller 2050 (FIG. 20, not illustrated in 
FIG. 26) operable to communicate with core controllers of other core nodes to 
exchange connection-scheduling data. A core node may serve as a transit node for 
other core-node pairs. A core node may also have a plurality of inlet ports 2014 
20 receiving optical signals from subtending edge nodes and a plurality of outlet ports 
2024 transmitting optical signals to subtending edge nodes. 

A connection from illustrated source edge node 151 OX subtending to core 
node 2020A to a destination edge node 1510Y subtending to core node 1920C may 
be routed directly along a core channel from an outward port 2027-1 of core node 
25 1 920A to an inward port 201 6 of core node 1 920C. A connection through outward 
port 2027-2 and core node 2020D is illustrated in FIG. 26. 

To enable switching time-limited signals, an outward port 2027 of core node 
2020A time-locks to an adjacent core node. Time-locking implies delaying signals at 
the outward port 2027 of core node 2020A. The required delay at the outward port 
30 2027 may be realized through a process of optical-to-electrical (O-E) conversion of 
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an optical signal, placing the electrical output in an electronic memory, and reading 
the stored signal after a predetermined period of time. The delayed electronic signal 
then modulates an optical carrier (a wavelength) which is transmitted to the adjacent 
core node. Network 2600 differs from network 2500 in the application of delay to 
5 signals transmitted from a first core node to a second core node. In Network 2600. 
the delay is applied at the outward port of the first core node while in network 2500 
the delay is applied at the inward port of the second core node. The delay may be 
applied anywhere along the core channel connecting the outward port of the first 
core node to the inward port of the second core node. It is, however, preferable that 
1 0 the delay be applied at either end of the core channel. 

BIMODAL BURST TRANSFER 

A core controller 1950/2050 may be adapted to function as a bimodal 
controller A bimodal controller receives both first-mode and second-mode burst- 
transfer requests. In a first-mode request, data bursts may be rate regulated at an 

15 outbound port of a source edge node and descriptors of each data burst waiting at 
the outbound port and eligible for transfer to the core node are sent to the controller 
of the core node. The eligibility of a waiting data burst is determined by a flow-rate 
regulator. In the second-mode, flow-rate allocations for each burst stream are 
detemnined and communicated to the bimodal controller of the core node. The 

20 bimodal controller then determines a burst descriptor for each burst stream and 
generates regulated burst-transfer requests that are substantially evenly spaced in 
the time domain. The burst-descriptors received from the outbound port according to 
the first-mode, and the burst descriptors generated by the core controller according 
to the second mode, are submitted to the same burst scheduler associated with the 

25 core node which computes a schedule for each individual data burst, regardless of 
whether it is a waiting burst (first mode) or anticipated burst (second mode) and 
sends the burst schedules to the outbound port of the edge node. A burst scheduler 
is described in the aforementioned United States patent application 09/750,071 . 
Thus, a first-mode request Is required for each waiting burst at an outbound port 

30 while a second-mode request is required only when the flow-rate-allocation of a burst 
stream is modified. The burst schedules transmitted by the bimodal core controller 
identify the mode associated with each burst to enable the outbound port to transmit 
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the appropriate burst. Notably, the outbound port must be time-locked to the core 
node in order to enable the outbound port to time the transmission of bursts 
according to the schedule it receives from the core node. 

Proximity and storage Indices 

5 The round-trip propagation delay between an edge node and a core node 

may be used to define their proximity and, hence, the mode of burst transfer. A 
proximity index may be defined as the ratio of the round-trip propagation delay to a 
prescribed delay tolerance. For example, if the round-trip propagation delay from an 
edge node to a first core node is 1 millisecond and the delay tolerance for a first 

10 burst stream transmitted from the edge node to the first core node is 500 

microseconds, then the proximity index is 2.0. If the round-trip propagation delay 
from the edge node to a second core node is 20 milliseconds and the delay 
tolerance of a second burst stream transmitted from the edge node to the second 
core node is 100 milliseconds, then the proximity index is 0.20. Thus based on 

1 5 absolute proximity, the first mode of burst transfer would be selected for use . 
between the edge node and the first core node while the second mode would be 
selected for use from the edge node to the second core node. However, based on 
the proximity indices, the selection is reversed with the first burst stream using the 
second mode of burst transfer. 

20 Another factor that may be considered in determining the burst-transfer mode 

is the flow rate of a burst stream. The flow rate may be expressed in bits per second. 
The product of the aggregate flow rate p of burst streams using the first mode of 
burst transfer and the round-trip propagation delay t (in seconds) approximates the 
cumulative mean size of the data bursts waiting at a respective outbound port of an 

25 edge node. This value may be restricted to be an acceptable fraction of the buffer 
size at the respective outbound port. Thus, a storage index may be defined as the 
ratio pxi/A where A is the total size of the outbound buffer under consideration. If the 
addition of a new burst stream to burst streams already using the first mode results 
in a storage index exceeding a predefined threshold, the new burst stream uses 

30 the second mode of burst transfer. The predefined threshold is preferably less than 
unity, 0.8 for example, to avoid buffer overflow. 
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Thus, the proximity index and the storage index may be used to select the 
burst transfer mode. The first burst transfer mode would be selected when the 
proximity index does not exceed unity and the storage index does not exceed a 
specified threshold. Otherwise, the second burst-transfer mode is used. The 
5 specified threshold may be dynamic, being determined according to observed buffer 
occupancy. Recall that a burst stream is defined according to the origin and 
destination edge nodes and a burst stream sent from an edge node to a core node 
may be directed to different output ports of the core node. The burst transfer mode 
may also be selected according to combined proximity and storage indices. 

10 Referring to the above example, where the round-trip propagation delay from 

an edge node to a first core node is 1 millisecond and the delay tolerance is 500 
microseconds while the round-trip propagation delay from the edge node to a second 
core node is 20 milliseconds and the delay tolerance is 100 milliseconds, consider a 
first burst-stream from the edge node to the first core node with flow rate of 1 

1 5 Gigabits per second (Gb/s) and a second burst stream from the edge node to the 
second core node with a flow rate of 10 Mb/s. The buffer size at each of the . 
respective outbound ports is one megabyte (8x 10^ bits). The storage indices are 
then 1 0^x0.00 V(8x 10^) = 0.125 for the first burst stream and 1 0^x0.1/ (8x 10®) = 
0.125 for the second burst stream. Thus, if the specified storage threshold (less than 

20 unity) exceeds 0.125, and because the proximity index of the edge node with respect 
to the second core node is less than unity, the edge node may use the first mode of 
burst transfer without violating the delay tolerance or overflowing the respective 
outbound buffer of the edge node. 

The core node is adapted to control the flow of data bursts according to one of 
25 the two burst-transfer modes described above. With the first mode, burst-transfer 
schedules are generated by a core controller of a core node for data bursts already 
waiting at respective edge nodes. The size of each data burst is known and the 
scheduler's task is to determine the required time of its transmission from the edge 
node in a manner that avoids collision at an output port of the core node. With the 
30 second method, burst-transfer schedules are generated by the core controller for 
anticipated data bursts. Both the size of each burst and its transmission time are 
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determined by the core controller. The size is determined according to a flow-rate- 
allocation associated with each data stream. 

Bimodal Burst-Switching Network 

The edge nodes subtending to a core node may be classified into two 
5 categories with respect to the core node according to their proximity to the core 

node. An edge node in the first category communicates with the core node according 
to the first-mode of burst transfer and an edge node in the second category 
communicates with the core node according to the second-mode of burst transfer. 
An edge node within a specified proximity from a core node sends descriptors of 
10 data bursts formed at an outbound port of the edge node to a core-node controller 
and receives burst-transmission schedules from the core-node controller while an 
edge node beyond the specified proximity sends flow-rate-allocation requests to the 
core-node controller and receives burst descriptors and their transmission schedules 
from the core-node controller. 

15 The proximity of an edge node to a core node (and vice versa) may be 

determined according to known lengths of an uplink and a down link connecting the 
edge node to the core node. Preferably, the proximity is determined by measuring 
the round-trip propagation delay between the edge node and the core node. A 
measurement of the round-trip propagation delay between an edge node and a core 

20 node may be initiated by either the edge node or the core node. The edge node may 
subtend to several other core nodes and the core node may have several other 
subtending edge nodes. The core node has a single cyclic time counter and the 
edge node may have a cyclic time counter dedicated to each core node to which it 
subtends. At a selected instant of time, the edge node may send a measurement 

25 signal to the core node. A controller of the core node receives the signal, appends an 
indication of the interval of time during which the signal is delayed for processing at 
the core controller, and transmits the appended signal back to the edge node. The 
edge node may then determine the instant of time at which it receives the returned 
signal and calculate the round-trip propagation delay based on the sending and 

30 receiving instants of time and the core-node-delay indication. 
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Alternatively, the core node may initiate the measurement process by sending 
a measurement signal to the edge node at a selected instant of time. The signal may 
be delayed at the edge node. The edge node appends an indication of the delay time 
and returns the appended signal to the core node which then calculates the round- 
5 trip propagation delay. The calculated round-trip propagation delay is needed at the 
edge node. The core node, therefore, sends the calculated round-trip propagation 
delay back to the edge node. 

The sending and receiving time of the measurement signal must be read from 
the same time counter. The calculation of the round-trip propagation delay must be 
10 performed by the node that initiates the measurement because the time-counters at 
the edge nodes and core nodes are not mutually time locked. The time-counter of 
the edge node may be time locked to the time counter at the core node, but the 
reverse is not necessarily true. 

As described earlier, in the first-mode of burst transfer, an edge node reports 
15 descriptors of waiting bursts and requests a schedule for transferring each burst. In 
the second-mode of burst transfer, an edge node reports a required flow rate for 
each data stream and a core node controller specifies a burst size for each data 
stream and detennines a schedule for transfem'ng successive bursts. The burst 
scheduler at the core node schedules bursts indiscriminately. It is preferable that 
20 each core node be adapted to handle both modes of burst switching. 

Applicant's United States patent application 10/437,676 filed on May 14, 2003 
and titled "Regulating data-burst transfer", the specification of which is incorporated 
herein by reference, describes a flow-rate-regulation device for data bursts of 
arbitrary sizes. The device is operable in either of two modes. In the first mode, flow- 
25 rate-regulation is provided for already waiting bursts. In the second mode, the device 
is adapted to provide flow-rate-regulation for anticipated bursts. 

FIG. 27 illustrates an optical core node 2720 connecting edge nodes 2710 
that may be spread over a wide geographic area. Two of the edge nodes, each 
labeled as 271 OA, are within a proximity boundary defined by a round-trip 
30 propagation delay between an edge node and the optical core node 2720. The 
remaining edge nodes, each labeled as 271 OB, are distant edge nodes, being 
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located outside the proximity boundary. According to an absolute proximity criterion, 
a nearby edge node 271 OA preferably uses the first mode of burst-transfer control 
described eariier while a distant edge node 271 OB preferably uses the second mode 
of burst-transfer control in which the edge node sends flow-rate-allocation 
5 requirements to a controller 2730 of the optical core node 2720 for the data streams 
traversing the optical core node. These flow-rate allocations may change frequently, 
every 1 millisecond for example. Using the proximity index defined above, rather 
than an absolute proximity threshold, for selecting the burst transfer mode, the 
pattern of FIG. 27 would apply only if all burst streams have the same delay 
10 tolerance. 

FIG. 28 illustrates an edge node 2710 connecting to core nodes 2720 within 
and beyond a proximity boundary with respect to the edge node 2710. Core nodes, 
referenced individually or collectively as 2720A, are within a proximity boundary with 
respect to the edge node 2710 and core nodes, referenced individually or collectively 
as 2720B. are beyond the proximity boundary. With uniform delay tolerance, edge 
node 2710 communicates burst-scheduling requests to a core node 2720A within the 
proximity boundary but communicates flow-rate-allocation requests to a core node 
2720B beyond the proximity boundary, the proximity being determined on the basis 
of round-trip propagation delay, in accordance with an embodiment of the invention. 
An outbound port of edge node 2710 leading to an optical core node 2720A uses a 
flow-rate-regulation device operable according to the first mode of burst transfer 
while an outbound port of edge node 2710 leading to a core node 2720B uses a 
rate-regulation device operable according to the second mode of burst transfer. An 
outbound port may also be equipped with a bimodal flow-rate-regulation device 
which functions according to both modes. The control signal sent by the outbound 
port to a core node would include an indication of the required mode. 

FIG. 29 illustrates a first mode of burst transfer from an edge node 2710 to 
two bufferiess core nodes 2720 (FIG. 28). Control bursts 2910 are sent from edge 
node 2710 to a neighbouring core node 2720A and to a distant core node 2720B. 
30 The edge node 271 0 sends, to the controller of core node 2720A. a control burst 
291 0A1 that includes descriptors of at least one burst waiting at a first outbound port 
of the edge node connecting to an upstream channel leading to an inward port of 
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core-node 2720A. Likewise, the edge node 2710 sends a control burst 291 0B1 to 
the controller of core node 2720B, the control burst indicating descriptors of bursts 
waiting at a second outbound port of edge node 2710 connecting to an upstream 
channel leading to an inward port of core node 2720B. Edge node 2710 receives 
5 burst transfer schedules 2920A1 from the controller of core node 2720A and burst- 
transfer schedules 2920B1 from the controller of core node 2720B. Schedules 
2920A1 indicate an instant of time at which each waiting burst at the first outbound 
port is to arrive at core node 2720A. Schedules 2920B1 indicate an instant of time at 
which each waiting burst at the second outbound port is to arrive at core node 

10 2720B. When an outbound port of edge node 2710 is time-locked to the core node to 
which it connects through an upstream channel, the required arrival time at the core 
node is identical to the transmission time from the outbound port as read from a time 
counter, according to the time-locking method described in the aforementioned 
United States patent 6,570,872. Time at the outbound port and the core-node 

1 5 controller is indicated by identical cyclical time counters. 

While the edge node 2710 continues to transmit bursts according to a 
received schedule, it may also send new control bursts 291 0A2 and 291 0B2 and 
receive new schedules 2920A2 and 2920B2 so that idle time can be significantly 
reduced or entirely eliminated. The schedulers at the core node 2720A and 2720B 
20 schedule burst in continuous future time as described in the aforementioned United 
States patent application 09/750,071. 

Consider, for example, the case where the round-trip propagation delays 
between edge node 2710 and core nodes 2720A and 2720B are one millisecond and 
100 milliseconds, respectively. The delay between sending a control burst 291 0A1 

25 and receiving a schedule 2920A1, which includes processing time, would be slightly 
larger than one millisecond. The delay between sending a control burst 291 OBI and 
receiving a schedule 2920B1 would exceed 100 milliseconds. A descriptor of a burst 
is included in the control burst 291 OBI after the burst has already been formed at the 
respective outbound port of the edge node 2710. It is preferable then, when the 

30 round-trip propagation delay is large, to send the flow-rate allocation for the data 
streams traversing core node 2720B and let the controller of core node 2720B 
determine a nominal burst size and a transfer schedule according to the second 



16040 50 

mode (mode 1 ) of burst transfer described earlier. In either of the two burst-transfer 
modes (mode 0 or mode 1 ), flow-rate regulation may be required. In the first mode 
(mode 0), the flow-rate regulation takes place at a respective outbound port of an 
edge node and in the second mode (mode 1 ) the flow-rate regulation takes place at 
5 a core-node controller. 

FIG. 30 illustrates the use of the first burst-transfer control mode from the 
edge node 2710 to a near core node 2720A and the second burst-transfer control 
mode from the edge node to a distant core node 2720B. Control bursts 301 0A1 are 
sent from a first outbound port of edge node 2710 to neighbouring core node 2720A 

10 in a manner similar to that of FIG. 29. The edge node sends a control burst 3030B1 
to distant core node 2720B. Control burst 3030B1 contains flow-rate allocations for 
burst streams transmitted from the edge node to core node 2720B. Edge node 2710 
receives, at a first inbound port, burst transfer schedules 3020A1 from the controller 
of core node 2720A and, at a second inbound port, burst-transfer permits 3040B1 

1 5 from the controller of core node 2720B. The schedules 3020 indicate an instant of ' 
time at which each waiting burst at the first outbound port is to arrive at core node 
2720A while the permits indicate a specified burst size and an instant of time at 
which each burst formed at the second outbound port according to the specified size 
is to arrive at core node 2720B. As illustrated in FIG. 30, the outbound port of the 

20 edge node 2710 is continually transmitting bursts according to the latest permits it 
receives. A control burst 3030 need not be continuously sent if the flow-rate 
specifications do not change. FIG. 30 illustrates two control bursts 3010 (301 0A1 
and 301 0A2), two control bursts 3030 (3030B1 and 3030B2). two schedules 3020 
(3020A1 and 3020A2), and two permits 3040 (3040B1 and 3040B2). Each pemriit 

25 may contain schedules for several burst streams. 

FIG. 31 -A illustrates a first-mode burst transfer request 31 10 (associated with 
control bursts 2910 and 3010) sent from an outbound port of an edge node 2710 to a 
core node 2720. The request includes a fixed mode identifier 31 12 (set equal to zero 
to denote mode 0), a burst identifier 3114 which can be a cyclic number identifying 
30 each burst transmitted from the outbound port to the core node 2720, an identifier 
31 16 of the source edge node, an identifier 31 18 of a destination (sink) edge node, 
and the burst size 3120. FIG. 31 -B illustrates a second-mode burst transfer request 
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3130 (associated with control burst 3030) sent from an outbound port of an edge 
node 2710 to a core node 2720. The request includes a fixed mode identifier 3132 
(set equal to one), a burst-stream identifier 3134 which can be a cyclic number 
identifying each burst stream flowing from the outbound port to the core node 2720, 
5 an identifier 31 36 of the source edge node, an identifier 31 38 of a destination (sink) 
edge node, and a required flow rate 3140. Other fields may be added to both the first 
mode and second mode requests for further control functions. Several requests 31 10 
may be aggregated in a control burst 2910/3010 (FIG. 29 and FIG. 30) and several 
requests 3130 may be aggregated in a control burst 3030 (FIG. 30). The cyclic 
10 number in field 31 14 is bounded by the maximum number of bursts that may be 
considered concurrently at a respective outbound port of the edge node. The cyclic 
number in field 3134 is bounded by the maximum number of bursts streams that may 
be considered concurrently at a respective outbound port of the edge node. 

FIG. 32-A illustrates a response 3210 sent by the core controller of core node 
15 2720 in response to first-mode burst transfer request 31 10. The response includes a 
fixed mode identifier 321 2 '(set equal to zero), a burst identifier 3214 which is the 
identifier 31 14 of the scheduled burst, and a burst-arrival time 3218, which is the 
time at which the scheduled burst is to arrive at the core node 2720, as determined 
by a master time counter at the core node. It is also the time, as determined by an 
20 edge time counter at the edge node, at which the burst is to be transmitted, the edge 
time counter being time-locked to the master (core) time counter 

. FIG. 32-B illustrates a response 3230 sent by the core controller of the core 
node 2720 in response to the second-mode burst transfer request 3130. The 
response includes a fixed mode identifier 3232 (set equal to one), a burst-stream 

25 identifier 3234 which corresponds to a burst-stream identifier 3134, a burst-size limit 
3242, and an arrival time 3244 at which a burst belonging to burst stream 3234 and 
having a size not exceeding the limit 3242 is to arrive at the core node 2720. Several 
responses 3210 may be aggregated into a burst-transfer schedule 2920 (FIG. 29) 
and several responses 3230 may be aggregated into a burst-transfer permit 3040 

30 (FIG. 30). An outbound port of an edge node may transmit both first-mode (mode 0) 
and second mode (mode 1 ) requests and receive a grant burst containing responses 
3210 and 3230. 
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Bim dal Cor Contrail r 

FIG. 33 illustrates a bimodal core controller 3300 of a core node adapted to 
receive optical control signals. The optical control signals, received at inlet ports of a 
core node, are directed to a controller where they are demodulated to extract and 
5 parse the baseband control signals. Each control signal includes an indication of 
whether it is a first-mode control signal or a second-mode control signal. A first- 
mode control signal includes the size of the burst and its point of origin, amongst 
other data. A second-mode control signal includes a required flow-rate-allocation for 
a respective data stream. 

1 0 A multiplexer 3320 multiplexes control signals from input ports of the core 

node. Multiplexer 3320 may be a separate physical entity receiving time-interleaved 
optical control signals from the input ports of a core node structure of the type 
described with reference to FIG. 16. Multiplexer 3320 may also be a representation 
of a switching function within a switching fabric where input ports direct control 

15 signals to a control output port of the switching fabric and the optical control signals 
at the control output port are submitted to a core controller as in the structure of the 
core node of FIG. 17. 

A demultiplexer 3390 distributes control signals produced by core controller 
3300 to output ports of the core node. Demultiplexer 3390 may be a separate 

20 physical entity receiving optical control signals from a core controller through an 
electrical-optical interface if the structure of the core node is of the type described 
with reference to FIG. 16. Alternatively, demultiplexer 3390 may be a representation 
of a switching function within a switching fabric where a control input port of the 
switching fabric receives control signals from a core controller and distribute them, 

25 through the switching fabric, to respective output ports of the switching fabric, as in 
the core node described with reference to FIG. 17. 

A burst-transfer-request may be a first-mode (mode 0) or a second-mode 
(mode 1 ) request. Burst-transfer-request processor 3330 determines the request 
mode, sends a first-mode request directly to a burst scheduler 3370, and directs a 
30 second-mode request to a burst-descriptor generator 3360. A first-mode request 
specifies, amongst other descriptors, the size and destination of a burst. The burst 
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may belong to a burst stream that has been subjected to flow-rate regulation at an 
outbound port of an edge node 2710. A second-mode request specifies, amongst 
other descriptors, a destination and a flow-rate requirement for a burst stream and it 
is the duty of burst-descriptor generator 3360 to specify an appropriate burst size, 
5 and tentative arrival times at the core node of the specified bursts. Notably, the 
burst-descriptor generator 3360 produces a continuous sequence of burst 
descriptors that is modified or terminated only upon receipt of a new second-mode 
request. The new second-mode request, associated with an existing data stream 
may require a change in the flow-rate allocation or may request terminating the data 
1 0 stream by setting the flow-rate-allocation requirement equal to zero. 

For each data stream, the specified burst size and tentative arrival times are 
submitted to burst scheduler 3370. Burst scheduler 3370 need not be aware of the 
request mode and its function is to schedule the arrival time at the core node of the 
burst-transfer requests it receives. When the amval times are decided, they are sent 
15 to respective edge nodes 2710 which transmit the bursts at time instants that ensure 
arrival at the core node at precisely the specified arrival instants. Time coordination 
may be realized by means of a prior-art time locking process described in United 
States patent 6,570,872. 

A burst-schedule transmitter 3380 forms burst-transfer schedules for both 
20 first-mode requests and burst-transfer permits for second-mode requests and 
submits the schedules and permits to demultiplexer 3390 which directs them to 
respective edge nodes 2710. 

FIG. 34 illustrates the scheduling function at a core controller 2730 (FIG. 27 
and FIG. 28). The controller 2730 of a core node receives from edge nodes 2710 

25 flow-rate allocations for specified data streams (step 341 0) and determines a 

nominal burst size for each specified data stream (step 3412). The nominal burst 
sizes may be determined as a function of flow-rate allocations. The core-node 
controller 2730 includes a flow-rate regulator which attempts to generate time- 
regulated burst descriptors, each descriptor including a burst size and a 

30 corresponding data stream. The flow-rate regulator selects the dequeueing instants 
so that the burst length is approximately equal to the flow-rate allocation of a 
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corresponding burst stream multiplied by the inter-burst interval. A process of 
determining dequeueing instants is described in the aforementioned Applicant's 
United States patent application serial number 10/437,676. Thus, in step 3414, time- 
regulated bursts are generated for each specified data stream and the descriptors 
5 are sorted according to the output ports of the core node to which they are routed 
towards respective destination sink nodes (step 3416). The burst-descriptors of 
different data streams still have to be scheduled to prevent contention at the output 
ports of the core node. 

The descriptors sorted in step 3416 may be submitted directly to the 
10 scheduler. However, a core-node controller may also receive burst-transfer requests 
from some outbound ports of edge nodes where bursts are formed first at the 
outbound ports as indicated In step 3420. Each burst-transfer request includes a 
burst size and an indication of a destination sink node. In step 3426, the burst- 
transfer requests may be sorted according to the outlet ports of the core node and 
15 the sorted burst-transfer requests are merged with the burst descriptors sorted in 
step 341 6 into output queues, one output queue per output port of the core node 
(step 3430). The scheduler then determines a required arrival time, at the core node, 
for each burst for which a burst descriptor or a burst-transfer request has been 
issued (step 3440). The scheduling process (step 3440) may introduce some delay 
20 jitter due to output contention. 

An upstream channel from an outbound port of an edge node to an inlet port 
of a core node may carry several data streams, each bound to a respective 
destination (sink) node. A data stream may include bursts defined by the outbound 
port. A data stream may also include bursts defined by the controller of the core 
25 node. A dual-mode of burst switching can be realized as described below. 

If an edge node is within a predefined proximity to a given core node, it is 
preferable that a controller of an outbound port determine the total flow-rate- 
allocation requirement of each data stream carried by the upstream channel 
originating from the outbound port, compute a nominal burst size based on the flow- 
30 rate requirement, assemble bursts each having a size that does not exceed the 
computed nominal burst size, and communicate the actual size of the assembled 
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burst to the core-node controller as described with reference to FIG. 27 and FIG. 28. 
The core-node controller would then schedule the burst and communicate the 
schedule to the edge node. An outbound port then transmits the bursts according to 
the received schedule. 

5 If an edge node is beyond the predefined proximity to a given core node, It is 

preferable that a controller of an outbound port determine the total flow-rate- 
allocation requirement of each data stream carried by the upstream link, and let the 
controller of the core node compute a nominal burst size based on the flow-rate 
requirement and generate a stream of bursts for each data stream. The core node 
10 controller then schedules each generated burst and communicates the schedule to 
the edge node. The outbound port of the edge node then aggregates and assembles 
waiting packets of each data stream to form bursts where the size of each does not 
exceed a nominal burst size determined by the core-node controller and transmits 
the bursts according to schedule. 

15 As described earlier, an edge-node controller may select the burst-transfer 

mode according to other criteria, such as delay tolerance and storage capacities at 
the edge. The type of data or temris of a service-level-agreement may also influence 
the burst-transfer-mode selection. 

It is emphasized that the burst-size in the first mode corresponds to a burst 
20 that is already waiting at the output port while a burst-size in the second mode is the 
size of a burst to be formed from data packets anticipated to be present and 
assembled at the outbound port before the transmission time indicated in the 
schedule computed by the core node. 

It is also noted that, in the second mode, a data stream may change its flow- 
25 rate allocation requirement to zero, and the core-node controller would then generate 
and schedule only control bursts for the data stream. Control bursts are generally 
very short and sent at a low rate, thus consuming a negligible fraction of the capacity 
of the channels. The main reasons for maintaining a connection despite its zero 
flow-rate requirement are to maintain the time locking of the outbound port to the 
30 core node to which it connects through an upstream and a downstream channel and 
to communicate a required flow-rate allocation when needed. Regardless of the 
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proximity to a core node, each edge node must have means for time locking to the 
core node to ensure that a burst transferred from each edge node is timed to arrive 
at the core node at the instants of time specified by a schedule. 

FIG. 35 illustrates the process of fomning data bursts at an outbound port 
5 1526 of an edge node 1510 (FIG. 15) and storing the data bursts according to a 
corresponding burst-stream mode and destination. Burst selection for transmission 
to the core node is performed according to the burst-transfer mode and burst 
destination indicated in a burst schedule. A burst-formation device 3520 forms bursts 
for either mode of burst transfer. A formed burst is directed to a buffer 3512 or 3514 

10 through a selector 3524. If the first mode is used, the burst-formation device 3520 
forms bursts from data packets it receives from ingress ports and from inbound ports 
then sends scheduling requests to an external node selected according to a routing 
plan. A formed burst is then placed, through selector 3524, in a first-mode buffer 
3512 conresponding to the destination of the burst. Notably, inbound ports of an edge 

1 5 node receive data bursts that may be parsed at the inbound ports into their 

constituent data packets and the constituent data packets may be switched to either 
egress ports 1524 or outbound ports 1526 (FIG. 15). If the second mode is used, 
the burst-formation device 3520 receives, from a core node, burst-transfer permits 
each specifying a burst-size limit then forms bursts each having a size not exceeding 

20 the specified limit. A formed burst is then placed, through selector 3524, in a 
second-mode buffer 3514 corresponding to the burst's destination. 

Thus, two sets of buffers are provided, one corresponding to the first mode 
and the other to the second mode. Each set of buffers includes one buffer for each 
destination. With v destinations, there are 2v buffers. It is noted that, at any instant of 
25 time, only one burst can be transmitted from the outbound port and, hence, the 2v 
buffers are preferably logical buffers sharing the same memory device. 

Multi-source connection 

Connection requests can be classified in three categories, labeled type-1 , 
type-2, and type-3. A type-1 connection relates to an individual burst-transfer (one- 
30 to-one) request using the first mode of burst transfer. A type-2 connection relates to 
a nodal connection request where a node may multi-cast to two or more other nodes 
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(one-to-many) using the second mode of burst transfer. A type-3 connection relates 
to a network connection specifying several node pairs (many-to-many). 

Control bursts sent by the edge nodes are scheduled to arrive at the core- 
node controller during non-overlapping intervals. In accordance with the present 
5 invention, three types of control bursts are provided. The first type corresponds to 
the first mode burst switching while the second and third types correspond to the 
second mode of burst switching. A control-burst of the first type requests scheduling 
individual bursts. Flow-rate control of the bursts may be applied at the source edge 
node. A control burst of the second type, sent by a particular edge node, requests 

10 adjusting the flow-rate allocation for each traffic stream originating from the particular 
edge node. A control burst of the third type, sent by a particular edge node, requests 
adjusting flow-rate allocations for traffic streams defined by different source-nodes 
and sink nodes. A new flow-rate allocation adjusts the flow-rate requirement from 
zero to a specified value. Termination of allocated capacity of a path from a source 

15 edge node to a sink edge node is realized by adjusting a respective flow-rate 

allocation to equal zero. Hereinafter, a control-burst of the second type is called a 
nodal request and a control-burst of the third type is called a network request. A 
network request controls an entire embedded network within a larger parent network. 

In summary, three types of connection requests are defined: 

20 (1 ) a type-1 connection request relates to an individual burst, and hence 

to the first-mode of burst switching; 

(2) a type-2 connection request relates to at least one burst stream from 
a given source node to at least one sink node, each stream being 
allocated a respective flow-rate; and 

25 (3) a type-3 connection request relates to at least one burst stream from 

each of a plurality of source nodes to at least one sink node, each 
stream being allocated a respective flow-rate. 

FIG. 36 illustrates a network of eight edge nodes referenced as 3610-1 to 
3610-8, and two core nodes 3620-1 and 3620-2. A network 3600 can be divided into 
30 several embedded networks, also called virtual networks, each of which covering 
selected edge nodes 3610 with virtual-links interconnecting the selected edge nodes 
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and having capacities that can be adapted in real-time to follow traffic-pattern 
variations. At least one of the edge nodes may support an embedded-network 
controller adapted to form multiple-source flow-rate allocation requests each of the 
requests specifying flow-rate allocations to a plurality of paths from several source 
5 nodes to several sink nodes. The embedded network can be established through 
several core nodes, each of which having a burst-transfer regulator. The embedded- 
network controller may modify the flow-rate allocation requests according to traffic 
data it receives from the edge nodes constituting the embedded network. The 
embedded-network controller receives traffic data from source nodes that constitute 
1 0 the embedded network. 

In the example of FIG. 36, edge node 3610-2 is selected to host a virtual- 
network controller 3650. The virtual-network controller sends a type-3 request to 
modify the capacity allocation (flow-rate allocation) of a virtual network that covers 
five of edge nodes 3610-1, 3610-2. 3610-3, 3610-4, and 3610-7. The required flow- 
15 rate allocations are indicated in the virtuial-network-capacity indicated in FIG. 37. 

Each core node has a core controller for scheduling capacity requirements 
within the core node and communicating specifics of scheduled capacity to 
respective exclusive configuration controllers. Each embedded network has an 
exclusive configuration controller. An exclusive configuration controller may be 
20 associated with a selected edge node or a selected core node. The exclusive 
configuration controller may also comprise control modules associated with a 
selected edge node and a selected core node. A core node may host several 
exclusive configuration controllers or several modules of exclusive configuration 
controllers. 

25 FIG. 37 illustrates flow-rate-allocation requirements for selected pairs of edge 

nodes 3610-1, 3610-2, 3610-3, 3610-4, and 3610-7 that constitute a virtual network. 
The flow rates are expressed in an arbitrary unit. For example, if the unit is selected 
to be one Mb/s, then the traffic from node 3610-1 to node 3610-7 requires 820 Mb/s. 
The flow-rate-allocation requirements are based on exchange of traffic-volume (flow 

30 rate) information between the virtual-network controller 3650 and each of the edge 
nodes that are members of the virtual network. The flow-rate-allocation requirements 
may change frequently as the spatial traffic distribution changes with time. To be 
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responsive, the virtual network should be able to reconfigure rapidly, with a 
reconfiguration latency determined primarily by the propagation delays for the node 
pairs affected by reconfiguration. A virtual network covering the North American 
continent, for example, may be reconfigured in less than 1 00 milliseconds. The flow- 
5 rate allocation for a node pair may vary from zero to a maximum determined by the 
available free capacity of the shared parent network. 

FIG. 38 illustrates a virtual network created through the two core nodes 3620- 
1 and 3620-2 of a composite-star network 3600 with the flow-rate allocations for 
pairs of edge nodes corresponding to demand matrix 3700. The virtual-network 

10 controller may base its flow-rate allocation matrix 3700 on estimated traffic load of 
the virtual network taking into consideration the possibility of tandem switching at 
edge nodes 3610. For example a connection from edge node 3610-1 to edge-node 
3610-4, where a direct path through the core nodes is not provided, may be made 
through edge node 3610-7. The virtual-network comprises edge nodes 3610-1, 

1 5 361 0-2, 361 0-3, 361 0-4, and 361 0-7, and its controller is connected to edge node 
3610-2 as described above. 

To summarize, a parent network 3600 comprises edge nodes 3610 each 
having an edge controller (not illustrated in FIG. 36, see FIG. 15) and core nodes 
3620 each having a core controller (not illustrated in FIG. 36. see FIG. 17). Links 
20 3622 connect edge nodes 3610 to core nodes 3620. Preferably, each edge controller 
is adapted to fomn multiple-source flow-rate-allocation requests where each multiple- 
source flow-rate-allocation request specifies flow-rate allocations from two or more 
edge nodes to at least one edge node. 

An edge node forming multiple-source flow-rate-allocation requests selects a 
25 core node for each request. The requests may, however, be directed to more than 
one core node. At least one core controller includes burst-descriptor generator 3360 
(FIG. 33. not illustrated in FIG. 36) that generates burst descriptors complying with a 
flow-rate specification. A burst-descriptor generator may use a burst-transfer 
regulator described in the aforementioned United States Patent application 
30 10/437,676. 
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An embedded network, such as network 3800, may be dynamically created 
to include a subset of edge nodes 3610. Each edge node 3610 in the subset 
communicates with a controller 3650 of the embedded-network. The embedded- 
network controller 3650 modifies the flow-rate allocation requests according to traffic 
5 data it receives from at least one of the edge nodes 361 0. In one embodiment, one 
of the edge nodes 3610 may host the embedded-network controller 3650. In another 
embodiment one of the core nodes 3620 may host the embedded-network controller 
3650. 

FIG. 39 illustrates another virtual network created through the two core nodes 
1 0 3620-1 and 3620-2 with the flow-rate allocations for pairs of edge nodes 

corresponding to a demand matrix similar in structure to matrix 3700 of FIG. 37. The 
virtual-network comprises edge nodes 3610-1, 3610-2. 3610-3, 3610-5, 3610-6, and 
3610-8, and its controller is connected to edge node 3610-1 . 

FIG. 40 illustrates the logic used by a mechanism provided at a core-node 
1 5 controller for processing connection requests of different types. If a received 

connection request code is '00' (type-1 ), step 4020 directs the request to step 4080 
for burst scheduling. If the code is '10', the controller interprets the request as a flow- 
rate-allocation request and directs the request to step 4040 where burst descriptors 
are generated then submitted to the scheduler in step 4080. If the code is *1 1 the 
20 received request is interpreted as a multiple-source multiple destination request. The 
request is parsed into constituent single-source multiple destination requests and 
each of the constituent requests is submitted to the burst scheduler which issues 
burst-transfer pemnits and distribute them to respective edge nodes. Each single- 
source multiple destination requests is treated at a respective core node as multiple 
25 single-source single-destination requests. 

Using techniques well known in the art, the virtual-network controller acquires 
an awareness of the network topology and, hence, can divide the multiple-source 
requests among the core nodes. The virtual-network controller preferably uses an 
optimizing algorithm to create an order of preference of core nodes for each edge- 
30 node-pair. 
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A route set is a listing of selected routes from the source edge node to the 
sink edge node. A source edge node and a sink edge node define a "node-pair*' and 
a route set is associated with each node pair. In order to facilitate the process of 
finding a route having a sufficient free capacity to accommodate a connection 
5 request, a route set for each edge-node pair is created and sorted according to a 
prescribed criterion. The derivation of the route set is an "off-line" process that is 
executed only whenever relevant topological changes take place. The routes in a 
route set are sorted according to some route merit index. Each edge node maintains 
a route set and communicates same to the controller of the virtual network. The 
10 networi< 3600 may include several virtual networks, each having its own virtual- 
network controller 3650. An edge node may participate in more than one virtual 
network. 

FIG. 41 illustrates the format of a control burst. A control burst includes two 
bits that identify the control-burst type. The first bit identifies the burst-switching 

1 5 mode, for example '0' iridicates the first mode and *1 ' indicates the second mode. 
The second bit determines whether the request is a nodal request or a network 
request. Thus, the first two bits may be selected as *00' indicating first-mode burst- 
switching, '10' indicating second-mode nodal switching, or *1 V indicating second- 
mode network request. A first-mode request is always a nodal request. A first-mode 

20 network request is meaningless because it would require knowledge of the burst- 
sizes waiting at different edge nodes, other than the reporting edge node, and such 
information would not be available to the reporting edge node. For both type-2 and 
type-3 requests, releasing allocated capacity is done by specifying a flow-rate 
requirement of zero. One important advantage of second-mode burst switching is the 

25 ease of increasing and reducing flow-rate allocations for edge-node pairs. 

Following the 2-bit request-type field *00', a type-1 request has a field 41 12 
that contains an identifier of the reporting edge node and a field 41 14 that indicates 
the number x of destinations for which burst-scheduling is requested. These are 
followed by % records, each record including a cyclic burst number 4116, relevant to 
30 the reporting edge node, a sink edge-node number 4117 to which a burst is destined 
from the reporting edge node, and a burst size 41 18 expressed in a prescribed unit, 
for example bytes or a block of bytes. 
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Following the 2-bit request-type field '10\ a type-2 request has a field 4122 
that contains an identifier of the reporting edge node and a field 4124 that indicates 
the number k of destinations for which burst-transfer permits are requested. These 
are followed by k records, each record including a sink edge-node number 41 25 to 
5 which a stream of bursts is destined from the reporting edge node and a required 
flow-rate 4126 expressed in a prescribed unit, for example units of 16 kilobits per 
second. 

Following the 2-bit request-type field '1 r, a type-3 request has a field 4132 
that contains an identifier of the reporting edge node and a field 41 34 that indicates 

1 0 the number ^ of source nodes for which flow-rate allocations to specified sink nodes 
are requested. These are followed by ^ super-records, each super record including 
a field 4136 indicating an identifier of a source node followed by a field 4137 
indicating the number k of destination sink nodes for which flow-rate allocations from 
the source node are requested. These two fields are followed by k records each 

15 including a sink edge-node number 4138 to which a stream of bursts is destined 
from the source node corresponding to the super record, and a required flow-rate 
4139 expressed in a prescribed unit. 

FIG. 42A and FIG. 42B illustrate two type-3 requests sent to core nodes 3620- 
1 and 3620-2. respectively, in order to realize the flow-rate-allocation requirements of 

20 matrix 3700 (FIG. 37). The virtual-network controller 3650 is aware of the topology of 
network 3600 and, in this example, it decided to seek capacity allocation through the 
two core nodes 3620-1 and 3620-2 according to network connectivity. The request of 
FIG. 42A, which is directed to the controller of core node 3620-1 , indicates, in a flrst 
field 4132 that the reporting source node is 3610-2, and in a second field 4134 it 

25 indicates that four super records follow. The first super record corresponds to 
source node 3610-1 (field 4136) and specifies three destinations (field 4137). The 
three destinations (field 4138) are edge nodes 3610-2, 3610-3, and 3610-7, and the 
required flow-rates (field 3139) are 242, 1922, 820 units, respectively. The second 
super record corresponds to source node 3610-2 (field 4136) and specifies two 

30 destinations (field 41 37) 361 0-3 and 361 0-7, with flow-rate requirements of 920 units 
and zero. Notably, the zero flow-rate requirement Is sent by the virtual-node 
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controller to release an existing path between source node 3610-2 and sink node 
3610-7. The third and fourth super records are likewise interpreted. 

The second type-3 request of FIG. 42B is sent from reporting source-node 
3610-2 to the controller of core-node 3620-2. The parsing of the request follows the 
5 same process used to parse the request of FIG. 42A. 

A connection request can specify zero capacity requirement and, therefore, 
there is no need to send a release request when a connection is terminated. For 
example, in matrix 3700, the flow-rate allocations for node pairs (3610-2, 3610-7) 
and (3610-4, 3610-7) are set equal to zero to indicate a request to temiinate 
1 0 respective connections. 

In summary, a core node, or a core network comprising several core nodes, 
may be shared by several independent networks each having its own control system. 
An independent network, also called an embedded network or a virtual network, 
comprises edge nodes, a dynamic share of the core, and an exclusive configuration 

15 ' controller. An exclusive configuration controller may be centralized or distributed. 
Ah exclusive configuration controller of a given independent network preferably has 
means for determining flow-rate requirements for each directed pair of edge nodes in 
the given independent network and means for determining capacity requirements of 
paths between each edge node in the given independent network and each core 

20 node. An edge node may not need a path to each core node. The configuration 
controller communicates capacity requirements to respective edge nodes of its 
independent network and to respective core nodes. The capacity requirements may 
be detemiined by an automated network-sizing algorithm associated with the 
configuration controller so that the capacity requirements can be updated to follow 

25 spatial and temporal traffic-level fluctuations. 

The invention thus provides methods and apparatus for controlling the 
transfer of data bursts of variable sizes so that data bursts traversing a network path 
from a source node to a sink node are constrained by an allocated bit rate. While 
data bursts are segmented and, where necessary, null-padded to facilitate switching 
30 at edge nodes, the data bursts are transfen^ed across a network in their native form 
and rate regulated as such. The invention also enables the configuration of a 



